DIGRESS: DISCRETE DENOISING DIFFUSION FOR GRAPH GENERATION

Abstract

This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes. Our model utilizes a discrete diffusion process that progressively edits graphs with noise, through the process of adding or removing edges and changing the categories. A graph transformer network is trained to revert this process, simplifying the problem of distribution learning over graphs into a sequence of node and edge classification tasks. We further improve sample quality by introducing a Markovian noise model that preserves the marginal distribution of node and edge types during diffusion, and by incorporating auxiliary graph-theoretic features. A procedure for conditioning the generation on graph-level features is also proposed. DiGress achieves state-of-theart performance on molecular and non-molecular datasets, with up to 3x validity improvement on a planar graph dataset. It is also the first model to scale to the large GuacaMol dataset containing 1.3M drug-like molecules without the use of molecule-specific representations.

1. INTRODUCTION

Denoising diffusion models (Sohl-Dickstein et al., 2015; Ho et al., 2020) form a powerful class of generative models. At a high-level, these models are trained to denoise diffusion trajectories, and produce new samples by sampling noise and recursively denoising it. Diffusion models have been used successfully in a variety of settings, outperforming all other methods on image and video (Dhariwal & Nichol, 2021; Ho et al., 2022) . These successes raise hope for building powerful models for graph generation, a task with diverse applications such as molecule design (Liu et al., 2018) , traffic modeling (Yu & Gu, 2019) , and code completion (Brockschmidt et al., 2019) . However, generating graphs remains challenging due to their unordered nature and sparsity properties. Previous diffusion models for graphs proposed to embed the graphs in a continuous space and add Gaussian noise to the node features and graph adjacency matrix (Niu et al., 2020; Jo et al., 2022) . This however destroys the graph's sparsity and creates complete noisy graphs for which structural information (such as connectivity or cycle counts) is not defined. As a result, continuous diffusion can make it difficult for the denoising network to capture the structural properties of the data. In this work, we propose DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes. Our noise model is a Markov process consisting of successive graphs edits (edge addition or deletion, node or edge category edit) that can occur independently on each node or edge. To invert this diffusion process, we train a graph transformer network to predict the clean graph from a noisy input. The resulting architecture is permutation equivariant and admits an evidence lower bound for likelihood estimation. We then propose several algorithmic enhancements to DiGress, including utilizing a noise model that preserves the marginal distribution of node and edge types during diffusion, introducing a novel guidance procedure for conditioning graph generation on graph-level properties, and augmenting the input of our denoising network with auxiliary structural and spectral features. These features, derived from the noisy graph, aid in overcoming the limited representation power of graph neural networks (Xu et al., 2019) . Their use is made possible by the discrete nature of our noise model, which, in contrast to Gaussian-based models, preserves sparsity in the noisy graphs. These improvements enhance the performance of DiGress on a wide range of graph generation tasks. Our experiments demonstrate that DiGress achieve state-of-the-art performance, generating a high rate of realistic graphs while maintaining high degree of diversity and novelty. On the large MOSES and GuacaMol molecular datasets, which were previously too large for one-shot models, it notably matches the performance of autoregressive models trained using expert knowledge.

2. DIFFUSION MODELS

In this section, we introduce the key concepts of denoising diffusion models that are agnostic to the data modality. These models consist of two main components: a noise model and a denoising neural network. The noise model q progressively corrupts a data point x to create a sequence of increasingly noisy data points (z 1 , . . . , z T ). It has a Markovian structure, where q(z 1 , . . . , z T |x) = q(z 1 |x) T t=2 q(z t |z t-1 ). The denoising network ϕ θ is trained to invert this process by predicting z t-1 from z t . To generate new samples, noise is sampled from a prior distribution and then inverted by iterative application of the denoising network. While early models would directly predict z t-1 from z t (Sohl-Dickstein et al., 2015), these models were difficult to train due to the dependence of z t-1 on the sampled diffusion trajectories. Ho et al. ( 2020) considerably improved performance by establishing a connection with score-based models (Song & Ermon, 2019) . They showed that when q(z t-1 |z t , x)dp θ (x) is tractable, x can be used as the target of the denoising network, which removes an important source of label noise. For a diffusion model to be efficient, three properties are required: 1. The distribution q(z t |x) should have a closed-form formula, to allow for parallel training on different time steps. 2. The posterior p θ (z t-1 |z t ) = q(z t-1 |z t , x)dp θ (x) should have a closed-form expression, so that x can be used as the target of the neural network. 3. The limit distribution q ∞ = lim T →∞ q(z T |x) should not depend on x, so that we can use it as a prior distribution for inference. These properties are all satisfied when the noise is Gaussian. When the task requires to model categorical data, Gaussian noise can still be used by embedding the data in a continuous space with a one-hot encoding of the categories (Niu et al., 2020; Jo et al., 2022) . We develop in Appendix A a graph generation model based on this principle, and use it for ablation studies. However, Gaussian noise is a poor noise model for graphs as it destroys sparsity as well as graph theoretic notions such as connectivity. Discrete diffusion therefore seems more appropriate to graph generation tasks. Recent works have considered the discrete diffusion problem for text, image and audio data (Hoogeboom et al., 2021; Johnson et al., 2021; Yang et al., 2022) . We follow here the setting proposed by Austin et al. (2021) . It considers a data point x that belongs to one of d classes and x ∈ R d its one-hot encoding. The noise is now represented by transition matrices (Q 1 , ..., Q T ) such that [Q t ] ij represents the probability of jumping from state i to state j: q(z t |z t-1 ) = z t-1 Q t . As the process is Markovian, the transition matrix from x to z t reads Qt = Q 1 Q 2 ...Q t . As long as Qt is precomputed or has a closed-form expression, the noisy states z t can be built from x using q(z t |x) = x Qt without having to apply noise recursively (Property 1). The posterior distribution q(z t-1 |z t , x) can also be computed in closed-form using Bayes rule (Property 2): q(z t-1 |z t , x) ∝ z t (Q t ) ′ ⊙ x Qt-1 where ⊙ denotes a pointwise product and Q ′ is the transpose of Q (derivation in Appendix D). Finally, the limit distribution of the noise model depends on the transition model. The simplest and most common one is a uniform transition (Hoogeboom et al., 2021; Austin et al., 2021; Yang et al., 2022) parametrized by Q t = α t I + (1 -α t )1 d 1 ′ d /d with α t transitioning from 1 to 0. When lim t→∞ α t = 0, q(z t |x) converges to a uniform distribution independently of x (Property 3).

