GRAPH CONVOLUTIONAL NORMALIZING FLOWS FOR SEMI-SUPERVISED CLASSIFICATION & CLUSTERING Anonymous

Abstract

Graph neural networks (GNNs) are discriminative models that directly model the class posterior p(y|x) for semi-supervised classification of graph data. While being effective for prediction, as a representation learning approach, the node representations extracted from a GNN often miss useful information for effective clustering, because that is not necessary for a good classification. In this work, we replace a GNN layer by a combination of graph convolutions and normalizing flows under a Gaussian mixture representation space, which allows us to build a generative model that models both the class conditional likelihood p(x|y) and the class prior p(y). The resulting neural network, GC-Flow, enjoys two benefits: it not only maintains the predictive power because of the retention of graph convolutions, but also produces well-separated clusters in the representation space, due to the structuring of the representation as a mixture of Gaussians. We demonstrate these benefits on a variety of benchmark data sets. Moreover, we show that additional parameterization, such as that on the adjacency matrix used for graph convolutions, yields additional improvement in clustering.

1. INTRODUCTION

Semi-supervised learning (Zhu, 2008) refers to the learning of a classification model by using typically a small amount of labeled data with possibly a large amount of unlabeled data. The presence of the unlabeled data, together with additional assumptions (such as the manifold and smoothness assumptions), may significantly improve the accuracy of a classifier learned even with few labeled data. A typical example of such a model in the recent literature is the graph convolutional network (GCN) of Kipf & Welling (2017) , which capitalizes on the graph structure (considered as an extension of a discretized manifold) underlying data to achieve effective classification. GCN, together with other pioneering work on parameterized models, have formed a flourishing literature of graph neural networks (GNNs), which excel at node classification (Zhou et al., 2020; Wu et al., 2021) . However, driven by the classification task, GCN and other GNNs may not produce node representations with useful information for goals different from classification. For example, the representations do not cluster well in some cases. Such a phenomenon is of no surprise. For instance, when one treats the penultimate activations as the data representations and uses the last dense layer as a linear classifier, the representations need only be close to linearly separable for an accurate classification; they do not necessarily form well-separated clusters. This observation leads to a natural question: can one build a representation model for graphs that not only is effective for classification but also unravels the inherent structure of data for clustering? The answer is affirmative. One idea is to, rather than construct a discriminative model p(y|x) as all GNNs do, build a generative model p(x|y)p(y) whose class conditional likelihood is defined by explicitly modeling the representation space, for example by using a mixture of well-separated unimodal distributions. Indeed, the recently proposed FlowGMM model (Izmailov et al., 2020) uses a normalizing flow to map the distribution of input features to a Gaussian mixture, resulting in wellstructured clusters. This model, however, is not designed for graphs and it underperforms GNNs that leverage the graph structure for classification. In this work, we present graph convolutional normalizing flows (GC-Flows), a generative model that not only classifies well, but also yields node representations that capture the inherent structure of data, as a result forming high-quality clusters. We can relate GC-Flows to both GCNs and FlowGMMs. On the one hand, GC-Flows incorporate each GCN layer with an invertible flow. Such a flow parameterization allows training a model through maximizing the likelihood of data representations being a Gaussian mixture, mitigating the poor clustering effect of GCNs. On the other hand, GC-Flows augment a usual normalizing flow model (such as FlowGMM) that is trained on independent data, with one that incorporates graph convolutions as an inductive bias in the parameterization, boosting the classification accuracy. In Figure 1 , we visualize for a graph data set the nodes in the representation space using t-SNE. It suggests that GC-Flow inherits the clustering effect of FlowGMM, while being similarly accurate to GCN for classification. A few key characteristics of GC-Flows are as follows: 1. A GC-Flow is a GNN, because being applied to graph data, it computes node representations by using the graph structure. In contrast, a FlowGMM is not a GNN. 2. A GC-Flow is a generative model, admitting FlowGMMs as a special case when graph is absent. 3. As a generative model, the training loss function of GC-Flows involves both labeled and unlabeled data, similar to FlowGMMs, while that of GNNs involves only the labeled data. Significance. While classification is the dominant node-level task that concerns the current literature on GNNs, the importance of clustering in capturing the inherent structure of data is undeniable. This work addresses a weakness of the current GNN literature-particularly, the separation of clusters. A Gaussian mixture representation space properly reflects this goal. The normalizing flow is a vehicle to parameterize the feature transformation, so that it encourages the formation of separated Gaussians. It is a generative model that can return data densities. The likelihood training is organically tied to a generative model, whereas existing methods based on clustering or contrastive losses externally encourage the GNN to produce clustered representations, without a notation of densities.

2. RELATED WORK

Graph neural networks (GNNs) are machineries to produce node-level and graph-level representations, given graph-structured data as input (Zhou et al., 2020; Wu et al., 2021) . A popular class of GNNs are message passing neural networks (MPNNs) (Gilmer et al., 2017) , which treat information from the neighborhood of a node as messages and recursively update the node representation through aggregating the neighborhood messages and combing the result with the past node representation. Many popularly used GNNs can be considered a form of MPNNs, such as GG-NN (Li et al., 2016) , GCN (Kipf & Welling, 2017 ), GraphSAGE (Hamilton et al., 2017) , GAT (Veličković et al., 2018), and GIN (Xu et al., 2019) . Normalizing flows are invertible neural networks that can transform a data distribution to a typically simple one, such as the normal distribution (Rezende & Mohamed, 2015; Kobyzev et al., 2021; Papamakarios et al., 2021) . Because of invertibility, one may navigate the input and output distributions for purposes such as estimating densities and sampling new data. The densities of the two distributions are related by the change-of-variable formula, which involves the Jacobian determinant of the flow. Computing the Jacobian determinant is costly in general; thus, many proposed neural



Figure 1: Representation space of the data set Cora under different models, visualized by t-SNE. Coloring indicates groud-truth labeling. Silhouette coefficients measure cluster separation. Micro-F1 scores measure classification accuracy.

