GRAPHSAD: LEARNING GRAPH REPRESENTATIONS WITH STRUCTURE-ATTRIBUTE DISENTANGLEMENT

Abstract

Graph Neural Networks (GNNs) learn effective node/graph representations by aggregating the attributes of neighboring nodes, which commonly derives a single representation mixing the information of graph structure and node attributes. However, these two kinds of information might be semantically inconsistent and could be useful for different tasks. In this paper, we aim at learning node/graph representations with Structure-Attribute Disentanglement (GraphSAD). We propose to disentangle graph structure and node attributes into two distinct sets of representations, and such disentanglement can be done in either the input or the embedding space. We further design a metric to quantify the extent of such a disentanglement. Extensive experiments on multiple datasets show that our approach can indeed disentangle the semantics of graph structure and node attributes, and it achieves superior performance on both node and graph classification tasks.

1. INTRODUCTION

Representing nodes or entire graphs with informative low-dimensional feature vectors plays a crucial role in many real-world applications and domains, e.g. user analysis in social networks (Tan et al., 2011; Yan et al., 2013) , relational inference in knowledge graphs (Bordes et al., 2013; Trouillon et al., 2016; Sun et al., 2019) , molecular property prediction in drug/material discovery (Gilmer et al., 2017; Wu et al., 2018) and circuit response prediction in circuit design (Zhang et al., 2019) . Recently, Graph Neural Networks (GNNs) (Kipf & Welling, 2017; Velickovic et al., 2018; Xu et al., 2019) have shown their superiority in many different tasks. In general, the essential idea of these methods is to learn effective node representations (or graph representations with an additional graph pooling) through aggregating the attributes of each node and its neighbors in an iterative and nonlinear way. For an attributed graph, GNNs commonly encode the information of its graph structure and node attributes into a single representation. This might be problematic, since the semantic space of graph structure and node attributes might not be well aligned, and these two types of information could be useful for different tasks. For example, predicting the health condition of a user mainly depends on his/her profile information, and the social network does not provide too much meaningful information; in another case, the prediction of a user's social class mainly relies on his/her social network structure. Therefore, a more reasonable solution is to disentangle these two types of information into two distinct sets of representations, and the importance of which can be further determined by downstream tasks. Such disentangled representation has been proved to be beneficial to model's generalization ability and interpretability (Chen et al., 2016; Higgins et al., 2017; Alemi et al., 2017) . Recently, DisenGNN (Ma et al., 2019) studied disentangled node representation learning by grouping the neighbors of each node to different channels, and each channel corresponds to a different latent factor. In other words, DisenGNN focuses on disentangling the various latent factors of graph structure. By contrast, our work intends to disentangle the representations of graph structure and node attributes, which is orthogonal to their work and also more general. In this paper, we aim to learn node/graph representations with Structure-Attribute Disentanglement (GraphSAD). As a naive trial, we first attempt to conduct disentanglement in the input space, named as Input-SAD, which separates a graph into a structure and an attribute component and then encodes these two components respectively. However, since graph structure and node attributes are not completely independent, it is better to suppress the dependency of these two factors in the embedding space, instead of directly separating the input graph. Inspired by this fact, we propose to distill a

annex

graph's structure and attribute information into the distinct channels of embedding vectors, named as Embed-SAD. Concretely, for each node embedding, half of its elements capture the graph structure through edge reconstruction, and the other half extracts the attribute information by minimizing the mutual information with the structure counterpart and, at the same time, preserving semantic discriminability. In addition, we devise a metric to quantitatively evaluate graph representation's structure-attribute disentanglement, denoted as SAD-Metric, which measures the sensitivity of a model when varying either the graph structure or node attributes of an input graph.We summarize our contributions as follows:• We study structure-attribute disentangled node/graph representation learning through separating graph structure and node attributes in either the input or the space. • We design a quantitative metric to measure the extent of structure-attribute disentanglement, which is novel on its graph-specific data processing scheme. • Through combining the proposed disentangling techniques with various GNNs, we empirically verify our method's superior performance on both the node and graph classification benchmark datasets. Also, we analyze the disentangled graph representations via the proposed metric and qualitative visualization.

2.1. PROBLEM DEFINITION

We study learning node representations (e.g. social networks) or whole-graph representations (e.g. molecular graphs) of attributed graphs. Formally, we denote an attributed graph as G = (V, E, A).V denotes the set of nodes. E = {(u, v, t uv )} is the set of edges with t uv as the type of the edge connecting node u and v (e.g. different types of bonds in molecular graphs). A = {A v |v ∈ V} represents the set of node attributes.Our goal is to learn meaningful representations for each node or the whole graph. Existing GNNs typically mix both the graph structure and node attributes into a unified representation through neural message passing. However, in practice, these two types of information may encode different semantics and be useful for different tasks. Take the prediction on social networks as an example.When predicting the social class of users, the graph structure plays a more important role than user attributes, while user attributes are definitely more informative than graph structure when forecasting users' health conditions. It is therefore desirable to disentangle the information of graph structure and node attributes into different sets of representations and use the downstream task to determine their importance. Specifically, we define our problem as follows:Node/Graph Representation Learning with Structure-Attribute Disentanglement. Given an attributed graph G = (V, E, A), we aim to learn node (or whole-graph) representations by disentangling the semantics of graph structure S = {V, E} and node attributes A into two distinct sets of representations, i.e.). The importance of the two kinds of representations is further determined by the downstream task such as node or graph classification.

2.2. PRELIMINARIES

Graph Neural Networks (GNNs). A GNN maps each node v ∈ V to an embedding vector z v and also encodes the entire graph G as vector z G . For an L-layer GNN, the L-hop information surrounding each node is captured via a neighborhood aggregation mechanism. Formally, the l-th GNN layer can be defined as:where N (v) is the set of node v's neighbors, t uv denotes edge attribute, zv denotes the representation of v at the l-th layer, and z (0) v is initialized by the node attribute A v . Using all the node embeddings in a graph, the entire graph's embedding can be derived by a permutation-invariant readout function:(2)

