Wasserstein diffusion on graphs with missing attributes

Abstract

Many real-world graphs are attributed graphs where nodes are associated with non-topological features. While attributes can be missing anywhere in an attributed graph, most of existing node representation learning approaches do not consider such incomplete information. In this paper, we propose a general non-parametric framework to mitigate this problem. Starting from a decomposition of the attribute matrix, we transform node features into discrete distributions in a lower-dimensional space equipped with the Wasserstein metric. On this Wasserstein space, we propose Wasserstein graph diffusion to smooth the distribution representations of nodes with information from their local neighborhoods. This allows us to reduce the distortion caused by missing attributes and obtain integrated representations expressing information of both topology structure and attributes. We then pull the nodes back to the original space and produce corresponding point representations to facilitate various downstream tasks. To show the power of our representation method, we designed two algorithms based on it for node classification (with missing attributes) and matrix completion respectively, and demonstrate their effectiveness in experiments.

1. Introduction

Many real-world networks are attributed networks, where nodes are not only connected with other nodes, but also associated with features, e.g., social network users with profiles or keywords showing interests, Internet Web pages with content information, etc. Learning node representations underlie various downstream graph-based learning tasks and have attracted much attention (Perozzi et al., 2014; Grover & Leskovec, 2016; Pimentel et al., 2017; Duarte et al., 2019) . A high-quality node representation is able to express node-attributed and graph-structured information and can better capture meaningful latent information. Random walk based graph embedding approaches (Perozzi et al., 2014; Grover & Leskovec, 2016) exploit graph structure information to preserve pre-specified node similarities in the embedding space and have proven successful in various applications based on plain graphs. In addition, graph neural networks, many of which base on the message passing schema (Gilmer et al., 2017) , aggregate information from neighborhoods and allow us to incorporate attribute and structure information effectively. However, most of the methods, which embed nodes into a lower-dimensional Euclidean space, suffer from common limitations: they fail to model complex patterns or capture complicated latent information stemming from the limited representation capacity of the embedding space. There has recently been a tendency to embed nodes into a more complex target space with an attempt to increase the ability to express composite information. A prominent example is Wasserstein embedding that represents nodes as probability distributions (Bojchevski & Günnemann, 2018; Muzellec & Cuturi, 2018; Frogner et al., 2019) equipped with Wasserstein metric. A common practice is to learn a mapping from original space to Wasserstein space by minimizing distortion while the objective functions are usually difficult to optimize and require expensive computations. On the other hand, most representation learning methods highly depend on the completeness of observed node attributes which are usually partially absent and even entirely inaccessible in real-life graph data. For instance, in the case of social networks like Facebook and Twitter, in

