GRAPHCG: UNSUPERVISED DISCOVERY OF STEERABLE FACTORS IN GRAPHS

Abstract

Deep generative models have been widely developed for graph data such as molecular graphs and point clouds. Yet, much less investigation has been carried out on understanding the learned latent space of deep graph generative models. Such understandings can open up a unified perspective and provide guidelines for essential tasks like controllable generation. To this end, this work develops a method called GraphCG for unsupervised discovery of steerable factors in latent space of deep graph generative models. We first examine the representation space of the recent deep generative models trained for graph data, and observe that the learned representation space is not perfectly disentangled. Thus, our method is designed for discovering steerable factors of graph data in a model-agnostic and task-agnostic manner. Specifically, GraphCG learns the semantic-rich directions via maximizing the corresponding mutual information, where the edited graph along the same direction will possess certain steerable factors. We conduct experiments on two types of graph data, molecular graphs and point clouds. Both the quantitative and qualitative results show the effectiveness of GraphCG for discovering steerable factors.

1. INTRODUCTION

Graph is a general format for many real-world data. For instance, molecules can be treated as graphs [10, 14] where the chemical atoms and bonds correspond to the topological nodes and edges respectively. Processing point clouds as graphs is also a popular strategy [53, 58] , where points are viewed as nodes and edges are built among the nearest neighbors. Many existing works on deep generative models (DGMs) focus on modeling the graph data and improving the synthesis quality. However, understanding DGMs on graph and their learned representations has been much less explored, which may hinder the development of important applications like the controllable generation (also referred to as the data editing) and the discovery of interpretable data structure. The graph controllable generation task refers to modifying the steerable factors of graph so as to obtain graphs with desired properties [9, 43] . This is an important task in many applications, but traditional methods (e.g., manual editing) possess certain inherent limitations under certain circumstances. A classic example is molecule editing, which aims at modifying the substructures of molecules [38] and can relate to some key tactics in drug discovery like functional group change [13] and scaffold hopping [2, 23] . This is a routine task in pharmaceutical companies, yet, relying on domain experts for manual editing can be subjective or biased [9, 15] . Different from previous works, this paper aims to explore the unsupervised graph editing with DGMs. It can act as a good complementary module to conventional methods and bring many crucial benefits: (1) It enables the efficient graph editing in the large-scale setting. (2) It alleviates the requirements for extensive domain knowledge for factor change labeling. (3) It provides another perspective for editing preference, which reduces biases from the domain experts. One core property relevant to the general unsupervised data editing using DGMs is the disentanglement. While there does not exist a widely-accepted definition of disentanglement, the key intuition [36] is that a disentangled representation should separate the distinct, informative, and steerable factors of variations in the data. Thus, the controllable generation task would become trivial with the disentangled DGMs as the backbone. Such a disentanglement assumption has been widely used in generative modeling on the image data, e.g., β-VAE [19] learns disentangled representation by forcing the representation to be close to an isotropic unit Gaussian. However, it may introduce extra constraints on the formulations and expressiveness of DGMs [11, 19, 47, 60] . For graph data, one crucial question remains: is the latent representation space learned from DGMs on graph data disentangled? In image generation, prior work [36] shows that without inductive bias, the latent space learned by VAEs is not guaranteed to be disentangled. However, the disentanglement property of graph DGMs is much less explored. In Sec. 3, we first study the latent space of DGMs on two typical graph data (molecular graphs and point clouds), and empirically illustrate that the learned space is not perfectly disentangled. This observation then raises the second question: Given a pretrained DGM with not perfectly disentangled latent space, is there a flexible framework enabling the graph controllable generation in an unsupervised manner? To tackle this, we propose a model-agnostic and task-agnostic framework called GraphCG for unsupervised graph controllable generation. GraphCG has two main phases, as illustrated in Fig. 1 . During the training phase (Fig. 1(a) ), GraphCG starts with the assumption that the steerable directions can be learned by maximizing the mutual information (MI) among the semantic directions. We formulate GraphCG with an energy-based model (EBM), which provides a large family of solutions. Then during the test phase, with the learned semantic directions, we can carry out the editing task by moving along the direction with certain step sizes. As the example illustrated in Fig. 1 (b), the molecular structure (hydroxyl group) changes consistently along the editing sequence. For evaluation, we visually verify the learned semantic directions on both types of graph data. Further for the molecular graphs, we propose a novel evaluation metric called sequence monotonic ratio (SMR) to measure the output sequences. We summarize our contributions as follows: (1) We conduct an empirical study on the disentanglement property of three pretrained deep generative models (DGMs) on two types of graph data, molecular graphs and point clouds. We find that the latent space of these pretrained graph DGMs is not perfectly disentangled. ( 2) We propose a model-agnostic and task-agnostic method called GraphCG for the unsupervised graph controllable generation. GraphCG aims at learning the semantic directions by maximizing their corresponding mutual information, and its outputs are sequences of graphs. (3) We qualitatively evaluate the proposed methods on two types of graph data, molecular graphs and point clouds. Besides, the quantitative results further show the clear improvement over the baselines.

Related work.

Recent works leverage the DGMs for various controllable generation tasks [5, 61] , where the inherent assumption is that the learned latent representations encode rich semantics, and thus traversal in the latent space can help steer factors of data [17, 26, 52] . Among them, one research direction [40, 52] is using supervised signals to learn the semantic-rich directions, and most works on editing the graph data focus on the supervised setting [27, 57, 64] . However, these approaches can not be applied to many realistic scenarios where extracting the supervised labels is difficult. Another research line [17, 45, 51] considers discovering the latent semantics in an unsupervised manner, but most unsupervised methods are designed to be either model-specific or task-specific, making them not directly applicable to the graph data. More comprehensive discussion is in Appendix B.

2. BACKGROUND AND PROBLEM FORMULATION

Graph and deep generative models (DGMs). Each graph data (including nodes and edges) is denoted as x ∈ X , where X is the data space, and DGMs learn the data distribution, i.e., p(x). Our



(a) Training phase of GraphCG. (b) Test phase of GraphCG.

Figure 1: (a) The training phase. Given two latent codes z u and z v , we edit the four latent representations along i-th and j-th direction with step size α and β respectively. The goal of GraphCG is to align the positive pair ( zu i,α and zv i,α ), and contrast them with zu j,β and zv j,β respectively. (b) The test phase. We will first sample an anchor molecule, and adopt the learned directions in the training phase for editing. With step size α ∈ [-3, 3], we can generate a sequence of molecules. Specifically, after decoding, there is a functional group change shown up: the number of hydroxyl groups decreases along the sequence in the decoded molecules.

