AUTOREGRESSIVE DIFFUSION MODEL FOR GRAPH GENERATION Anonymous authors Paper under double-blind review

Abstract

Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are all one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints. We propose an autoregressive diffusion model for graph generation. Unlike existing methods, we define a node-absorbing diffusion process that operates directly in the discrete graph space. For forward diffusion, we design a diffusion ordering network, which learns an optimal node absorbing ordering from graph topology. For reverse generation, we design a denoising network that uses the reverse node ordering to efficiently reconstruct the graph by predicting one row of the adjacency matrix at a time. Based on permutation invariance of graph generation, we show that the two networks can be jointly trained by optimizing a simple lower bound of data likelihood. Our experiments on six diverse datasets show that our model achieves better or comparable generation performance with previous stateof-the-art, and meanwhile enjoys fast generation speed.

1. INTRODUCTION

Generating graphs from a target distribution is a fundamental problem in many domains such as drug discovery (Li et al., 2018) , material design (Maziarka et al., 2020) , social network analysis (Grover et al., 2019) , and public health (Yu et al., 2020) . Deep generative models have recently led to promising advances in this problem. Different from traditional random graph models (Erdos et al., 1960; Albert & Barabási, 2002) , these methods fit graph data with powerful deep generative models including variational auto-encoders (Simonovsky & Komodakis, 2018) , generative adversarial networks (Maziarka et al., 2020 ), normalizing flows (Madhawa et al., 2019) , and energy-based models (Liu et al., 2021) . These models are learned to capture complex graph structural patterns and then generate new high-fidelity graphs with desired properties. Recently, the emergence of probabilistic diffusion models has led to interest in diffusion-based graph generation (Jo et al., 2022) . Diffusion models decompose the full complex transformation between noise and real data into many small steps of simple diffusion. Compared with prior deep generative models, diffusion models enjoy both flexibility in modeling architecture and tractability of the model's probability distributions. To the best of our knowledge, there are two existing works We propose an autoregressive graph generative model named GRAPHARM via autoregressive diffusion on graphs. Autoregressive diffusion (ARM) (Hoogeboom et al., 2022a ) is an absorbing diffusion process (Austin et al., 2021) for discrete data, where exactly one dimension of the data decays to the absorbing state at each diffusion step. In GRAPHARM, we design node-absorbing autoregressive diffusion for graphs, which diffuses a graph directly in the discrete graph space instead of in the dequantized adjacency matrix space. The forward pass absorbs one node in each step by masking it along with its connecting edges, which is repeated until all the nodes are absorbed and the graph becomes empty. We further design a diffusion ordering network in GRAPHARM, which is jointly trained with the reverse generator to learn an optimal node ordering for diffusion. Compared with random ordering as in prior ARM models (Hoogeboom et al., 2022a) , the learned diffusion ordering not only provides a better stochastic approximation of the true marginal graph likelihood, but also eases generative model training by leveraging structural regularities. The backward pass in GRAPHARM recovers the graph structure by learning to reverse the nodeabsorbing diffusion process with a denoising network. The reverse generative process is autoregressive, which makes GRAPHARM easy to handle constraints during generation. However, a key challenge is to learn the distribution of reverse node ordering for optimizing the data likelihood. We show that this difficulty can be circumvented by just using the exact reverse node ordering and optimizing a simple lower bound of likelihood, based on the permutation invariance property of graph generation. The likelihood lower bound allows for jointly training the denoising network and the diffusion ordering network using a reinforcement learning procedure and gradient descent. The generation speed of GRAPHARM is much faster than the existing graph diffusion models (Jo et al., 2022; Niu et al., 2020) . Due to the autoregressive diffusion process in the node space, the number of diffusion steps in GRAPHARM is the same as the number of nodes, which is typically much smaller than the sampling steps in (Jo et al., 2022; Niu et al., 2020) . Furthermore, at each step of the backward pass, we design the denoising network to predict one row of the adjacency matrix at one time. The edges to be predicted follow a mixture of Bernoulli distribution to ensure dependencies among each other. This makes GRAPHARM much more efficient than most previous autoregressive graph generation models. Our key contributions are as follows: (1) To the best of our knowledge, our work is the first autoregressive diffusion-based graph generation model, underpinned by a new node self-absorbing diffusion process. (2) GRAPHARM learns an optimal node generation ordering and thus better leverages the structural regularities for autoregressive graph diffusion. (3) We validate our method on both synthetic and real-world graph generation tasks, on which we show that GRAPHARM outperforms existing graph generative models and is efficient in generation speed.

2. BACKGROUND

Diffusion Model and Absorbing Diffusion Given a training instance x 0 ∈ R D sampled from the underlying distribution p data (x 0 ), a diffusion model defines a forward Markov transition kernel q(x t |x t-1 ) to gradually corrupt training data until the data distribution is transformed into a simple noisy distribution. The model then learns to reverse this process by learning a denoising transition kernel parameterized by a neural network p θ (x t-1 |x t ). Most existing works on diffusion models use Gaussian diffusion for continuous-state data. To apply Gaussian diffusion on discrete data, one can use dequantization method by adding small noise to the data. However, dequantization distorts the original discrete distribution, which can cause difficulty in training diffusion-based models. For example, dequantization on graph adjacency matrices can destroy graph connectivity information and hurt message passing. Austin et al. (2021); Hoogeboom et al. (2021) introduce several discrete state space diffusion models using different Markov transition matrices. Among them, the absorbing diffusion is the most promising one due to its simplicity and strong empirical performance. Definition 1 (Absorbing Discrete Diffusion). An absorbing diffusion is a Markov destruction process defined in the discrete state space. At transition time step t, each element x (i) t in dimension i is independently decayed into an absorbing state with probabilities α(t). The absorbing state can be a [MASK] token for texts or gray pixel for images (Austin et al., 2021) . The diffusion process will converge to a stationary distribution that has all the mass on the absorbing



on diffusion-based graph generation: Niu et al. (2020) model the adjacency matrices using score matching at different noise scales, and uses annealed Langevin dynamics to sample new adjacency matrices for generation; Jo et al. (2022) propose a continuous-time graph diffusion model that jointly models adjacency matrices and node features through stochastic differential equations (SDEs). However, existing diffusion-based graph generative models suffer from three key drawbacks: (1) Generation Efficiency. The sampling processes of Niu et al. (2020); Jo et al. (2022) are slow, as Niu et al. (2020) requires a large number of diffusion noising levels and Jo et al. (2022) needs to solve a complex system SDEs. (2) Continuous Approximation. They convert discrete graphs to continuous state spaces by adding real-valued noise to graph adjacency matrices. Such dequantization can distort the distribution of the original discrete graph structures, thus increasing the difficulty of model training. (3) Incorporating constraints. They are both one-shot generation models and hence cannot easily incorporate constraints during the one-shot generation process.

