Wasserstein diffusion on graphs with missing attributes

Abstract

Many real-world graphs are attributed graphs where nodes are associated with non-topological features. While attributes can be missing anywhere in an attributed graph, most of existing node representation learning approaches do not consider such incomplete information. In this paper, we propose a general non-parametric framework to mitigate this problem. Starting from a decomposition of the attribute matrix, we transform node features into discrete distributions in a lower-dimensional space equipped with the Wasserstein metric. On this Wasserstein space, we propose Wasserstein graph diffusion to smooth the distribution representations of nodes with information from their local neighborhoods. This allows us to reduce the distortion caused by missing attributes and obtain integrated representations expressing information of both topology structure and attributes. We then pull the nodes back to the original space and produce corresponding point representations to facilitate various downstream tasks. To show the power of our representation method, we designed two algorithms based on it for node classification (with missing attributes) and matrix completion respectively, and demonstrate their effectiveness in experiments.

1. Introduction

Many real-world networks are attributed networks, where nodes are not only connected with other nodes, but also associated with features, e.g., social network users with profiles or keywords showing interests, Internet Web pages with content information, etc. Learning node representations underlie various downstream graph-based learning tasks and have attracted much attention (Perozzi et al., 2014; Grover & Leskovec, 2016; Pimentel et al., 2017; Duarte et al., 2019) . A high-quality node representation is able to express node-attributed and graph-structured information and can better capture meaningful latent information. Random walk based graph embedding approaches (Perozzi et al., 2014; Grover & Leskovec, 2016) exploit graph structure information to preserve pre-specified node similarities in the embedding space and have proven successful in various applications based on plain graphs. In addition, graph neural networks, many of which base on the message passing schema (Gilmer et al., 2017) , aggregate information from neighborhoods and allow us to incorporate attribute and structure information effectively. However, most of the methods, which embed nodes into a lower-dimensional Euclidean space, suffer from common limitations: they fail to model complex patterns or capture complicated latent information stemming from the limited representation capacity of the embedding space. There has recently been a tendency to embed nodes into a more complex target space with an attempt to increase the ability to express composite information. A prominent example is Wasserstein embedding that represents nodes as probability distributions (Bojchevski & Günnemann, 2018; Muzellec & Cuturi, 2018; Frogner et al., 2019) equipped with Wasserstein metric. A common practice is to learn a mapping from original space to Wasserstein space by minimizing distortion while the objective functions are usually difficult to optimize and require expensive computations. On the other hand, most representation learning methods highly depend on the completeness of observed node attributes which are usually partially absent and even entirely inaccessible in real-life graph data. For instance, in the case of social networks like Facebook and Twitter, in which personal information is incomplete as users are not willing to provide their information for privacy concerns. Consequently, presentation learning models that require fully observed attributes may not be able to cope with these types of real-world networks. In this paper, we propose a novel non-parametric framework to mitigate this problem. Starting from a decomposition of the attribute matrix, we transform node features into discrete distributions in a lower-dimensional space equipped with the Wasserstein metric and implicitly implement dimension reduction which greatly reduces computational complexity. Preserving node similarity is a common precondition for incorporating structural information into representation learning. Based on this, we develop a Wasserstein graph diffusion process to effectively propagate a node distribution to its neighborhood and contain node similarity in the Wasserstein space. To some extent, this diffusion operation implicitly compensates for the loss of information by aggregating information from neighbors. Therefore, we reduce the distortion caused by missing attributes and obtain integrated node representations representing node attributes and graph structure. In addition to produce distribution representations, our framework can leverage the inverse mapping to transform the node distributions back to node features (point representations). Experimentally, we show that these node features are efficient node representations and wellsuited to various downstream learning tasks. More precisely, to comprehensively investigate the representation ability, we examine our framework on node classification concerning two missing cases: partially missing and entire node attributes missing. Moreover, we adapt our framework for matrix completion to show our ability to recover absent values. Contributions. We develop a novel non-parametric framework for node representation learning to utilize incomplete node-attributed information. The contributions of our framework are: 1. embedding nodes into a low-dimension discrete Wasserstein space through matrix decomposition; 2. reducing distortion caused by incomplete information and producing effective distribution representations for expressing both attributes and structure information through the Wasserstein graph diffusion process; 3. reconstructing node features which can be used for various downstream tasks as well as for matrix completion.

Graph representation learning

In this paper, we focus on learning node representations on attributed graphs. There are many effective graph embedding approaches, such as DeepWalk (Bojchevski & Günnemann, 2018 ), node2vec (Grover & Leskovec, 2016 ), GenVetor (Duarte et al., 2019) , which embed nodes into a lower-dimension Euclidean space and preserve graph structure while most of them disregard node informative attributes. So far, there are little attention paid to attribute information (Yang et al., 2015; Gao & Huang, 2018; Hong et al., 2019) . The advent of graph neural networks (Bruna et al., 2014; Kipf & Welling, 2017; Hamilton et al., 2017; Veličković et al., 2017; Gilmer et al., 2017; Klicpera et al., 2019a; b) fill the gap to some extent, by defining graph convolutional operations in spectral domain or aggregating neighborhood information in spatial domain. Their learned node representation integrate both node attributes and graph structure information. Due to the expressive limitation of Euclidean space, embedding into a more complex target space has been explored. There are several works to leverage distributions equipped with Wasserstein distance to model complex data since a probability is well-suited for modeling uncertainty and flexibility of complex networks. For instance, Graph2Gauss (Bojchevski & Günnemann, 2018) represents node as Gaussian distributions such that uncertainty and complex interactions across nodes are reflected in the embeddings. Similarly, Muzellec & Cuturi (2018) presented a framework in which point embedding can be thought of as a particular case of elliptical distributions embedding. Frogner et al. (2019) learn to compute minimum-distortion embeddings in discrete distribution space such that the underlying distances of input space are preserved approximately.

