VECODER -VARIATIONAL EMBEDDINGS FOR COM-MUNITY DETECTION AND NODE REPRESENTATION

Abstract

In this paper, we study how to simultaneously learn two highly correlated tasks of graph analysis, i.e., community detection and node representation learning. We propose an efficient generative model called VECODER for jointly learning Variational Embeddings for Community Detection and node Representation. VECODER assumes that every node can be a member of one or more communities. The node embeddings are learned in such a way that connected nodes are not only "closer" to each other but also share similar community assignments. A joint learning framework leverages community-aware node embeddings for better community detection. We demonstrate on several graph datasets that VECODER effectively outperforms many competitive baselines on all three tasks i.e. node classification, overlapping community detection and non-overlapping community detection. We also show that VECODER is computationally efficient and has quite robust performance with varying hyperparameters.

1. INTRODUCTION

Graphs are flexible data structures that model complex relationships among entities, i.e. data points as nodes and the relations between nodes via edges. One important task in graph analysis is community detection, where the objective is to cluster nodes into multiple groups (communities). Each community is a set of densely connected nodes. The communities can be overlapping or non-overlapping, depending on whether they share some nodes or not. Several algorithmic (Ahn et al., 2010; Derényi et al., 2005) and probabilistic approaches (Gopalan & Blei, 2013; Leskovec & Mcauley, 2012; Wang et al., 2017; Yang et al., 2013) to community detection have been proposed. Another fundamental task in graph analysis is learning the node embeddings. These embeddings can then be used for downstream tasks like graph visualization (Tang et al., 2016; Wang et al., 2016; Gao et al., 2011; Wang et al., 2017) and classification (Cao et al., 2015; Tang et al., 2015) . In the literature, these tasks are usually treated separately. Although the standard graph embedding methods capture the basic connectivity, the learning of the node embeddings is independent of community detection. For instance, a simple approach can be to get the node embeddings via DeepWalk (Perozzi et al., 2014) and get community assignments for each node by using k-means or Gaussian mixture model. Looking from the other perspective, methods like Bigclam (Yang & Leskovec ( 2013)), that focus on finding the community structure in the dataset, perform poorly for node-representation tasks e.g. node classification. This motivates us to study the approaches that jointly learn community-aware node embeddings. Recently several approaches, like CNRL (Tu et al., 2018 ), ComE (Cavallari et al., 2017 ), vGraph (Sun et al. (2019) ) etc, have been proposed to learn the node embeddings and detect communities simultaneously in a unified framework. Several studies have shown that community detection is improved by incorporating the node representation in the learning process (Cao et al., 2015; Kozdoba & Mannor, 2015) . The intuition is that the global structure of graphs learned during community detection can provide useful context for node embeddings and vice versa. The joint learning methods (CNRL, ComE and vGraph) learn two embeddings for each node. One node embedding is used for the node representation task. The second node embedding is the "context" embedding of the node which aids in community detection. As CNRL and ComE are based on Skip-Gram (Mikolov et al., 2013) and DeepWalk (Perozzi et al., 2014) , they inherit "context" embedding from it for learning the neighbourhood information of the node. vGraph also requires two 1

