GRAPHCG: UNSUPERVISED DISCOVERY OF STEERABLE FACTORS IN GRAPHS

Abstract

Deep generative models have been widely developed for graph data such as molecular graphs and point clouds. Yet, much less investigation has been carried out on understanding the learned latent space of deep graph generative models. Such understandings can open up a unified perspective and provide guidelines for essential tasks like controllable generation. To this end, this work develops a method called GraphCG for unsupervised discovery of steerable factors in latent space of deep graph generative models. We first examine the representation space of the recent deep generative models trained for graph data, and observe that the learned representation space is not perfectly disentangled. Thus, our method is designed for discovering steerable factors of graph data in a model-agnostic and task-agnostic manner. Specifically, GraphCG learns the semantic-rich directions via maximizing the corresponding mutual information, where the edited graph along the same direction will possess certain steerable factors. We conduct experiments on two types of graph data, molecular graphs and point clouds. Both the quantitative and qualitative results show the effectiveness of GraphCG for discovering steerable factors.

1. INTRODUCTION

Graph is a general format for many real-world data. For instance, molecules can be treated as graphs [10, 14] where the chemical atoms and bonds correspond to the topological nodes and edges respectively. Processing point clouds as graphs is also a popular strategy [53, 58] , where points are viewed as nodes and edges are built among the nearest neighbors. Many existing works on deep generative models (DGMs) focus on modeling the graph data and improving the synthesis quality. However, understanding DGMs on graph and their learned representations has been much less explored, which may hinder the development of important applications like the controllable generation (also referred to as the data editing) and the discovery of interpretable data structure. The graph controllable generation task refers to modifying the steerable factors of graph so as to obtain graphs with desired properties [9, 43] . This is an important task in many applications, but traditional methods (e.g., manual editing) possess certain inherent limitations under certain circumstances. A classic example is molecule editing, which aims at modifying the substructures of molecules [38] and can relate to some key tactics in drug discovery like functional group change [13] and scaffold hopping [2, 23] . This is a routine task in pharmaceutical companies, yet, relying on domain experts for manual editing can be subjective or biased [9, 15] . Different from previous works, this paper aims to explore the unsupervised graph editing with DGMs. It can act as a good complementary module to conventional methods and bring many crucial benefits: (1) It enables the efficient graph editing in the large-scale setting. (2) It alleviates the requirements for extensive domain knowledge for factor change labeling. (3) It provides another perspective for editing preference, which reduces biases from the domain experts. One core property relevant to the general unsupervised data editing using DGMs is the disentanglement. While there does not exist a widely-accepted definition of disentanglement, the key intuition [36] is that a disentangled representation should separate the distinct, informative, and steerable factors of variations in the data. Thus, the controllable generation task would become trivial with the disentangled DGMs as the backbone. Such a disentanglement assumption has been widely used in generative modeling on the image data, e.g., β-VAE [19] learns disentangled representation by forcing the representation to be close to an isotropic unit Gaussian. However, it may introduce extra constraints on the formulations and expressiveness of DGMs [11, 19, 47, 60] .

