MESHDIFFUSION: SCORE-BASED GENERATIVE 3D MESH MODELING



Zhen Liu 1,2 * , Yao Feng 2,3 , Michael J. Black 2 , Derek Nowrouzezahrai 4 , Liam Paull 1 , Weiyang Liu 2,5 1 Mila, Université de Montréal 2 Max Planck Institute for Intelligent Systems -Tübingen 

ABSTRACT

We consider the task of generating realistic 3D shapes, which is useful for a variety of applications such as automatic scene generation and physical simulation. Compared to other 3D representations like voxels and point clouds, meshes are more desirable in practice, because (1) they enable easy and arbitrary manipulation of shapes for relighting and simulation, and (2) they can fully leverage the power of modern graphics pipelines which are mostly optimized for meshes. Previous scalable methods for generating meshes typically rely on sub-optimal post-processing, and they tend to produce overly-smooth or noisy surfaces without fine-grained geometric details. To overcome these shortcomings, we take advantage of the graph structure of meshes and use a simple yet very effective generative modeling method to generate 3D meshes. Specifically, we represent meshes with deformable tetrahedral grids, and then train a diffusion model on this direct parametrization. We demonstrate the effectiveness of our model on multiple generative tasks.

1. INTRODUCTION

As one of the most challenging tasks in computer vision and graphics, generative modeling of high-quality 3D shapes is of great significance in many applications such as virtual reality and metaverse [11] . Traditional methods for generative 3D shape modeling are usually built upon representations of voxels [51] or point clouds [1], mostly because ground truth data of these representations are relatively easy to obtain and also convenient to process. Both representations, however, do not produce fine-level surface geometry and therefore cannot be used for photorealistic rendering of shapes of different materials in different lighting conditions. And despite being convenient to process



Figure 1: (a) Unconditionally generated 3D mesh samples randomly selected from the proposed MeshDiffusion, a simple diffusion model trained on a direct parametrization of 3D meshes without bells and whistles. (b) 3D mesh samples generated by MeshDiffusion with text-conditioned textures from [39]. MeshDiffusion produces highly realistic and fine-grained geometric details while being easy and stable to train.

