NEURAL VOLUMETRIC MESH GENERATOR

Abstract

Deep generative models have shown success in generating 3D shapes with different representations. In this work, we propose Neural Volumetric Mesh Generator (NVMG), which can generate novel and high-quality volumetric meshes. Unlike the previous 3D generative model for point cloud, voxel, and implicit surface, the volumetric mesh representation is a ready-to-use representation in industry with details on both the surface and interior. Generating this such highly-structured data thus brings a significant challenge. We first propose a diffusion-based generative model to tackle this problem by generating voxelized shapes with close-to-reality outlines and structures. We can simply obtain a tetrahedral mesh as a template with the voxelized shape. Further, we use a voxel-conditional neural network to predict the smooth implicit surface conditioned on the voxels, and progressively project the tetrahedral mesh to the predicted surface under regularizations. The regularization terms are carefully designed so that they can (1) get rid of the defects like flipping and high distortion; (2) force the regularity of the interior and surface structure during the deformation procedure for a high-quality final mesh. As shown in the experiments, our pipeline can generate high-quality artifact-free volumetric and surface meshes from random noise or a reference image without any postprocessing. Compared with the state-of-the-art voxel-to-mesh deformation method, we show more robustness and better performance when taking generated voxels as input.

1. INTRODUCTION

How to automatically create high-quality new 3D contents that are accessible and editable is a key problem in visual computing. Although generative models have revealed their power on audio and image synthesis (Goodfellow et al., 2014; Kingma & Welling, 2013; Higgins et al., 2016; Goodfellow et al., 2014; Brock et al., 2018; Ho et al., 2019; Song & Ermon, 2019; Ho et al., 2020; Song et al., 2020) , their performance remains limited. A major challenge of the current methods is the representation of the 3D shapes. Many generative models focus on point clouds (Fan et al., 2017; Yang et al., 2019; Cai et al., 2020; Luo & Hu, 2021a; b) . However, it is non-trivial to convert point clouds to other shape representations. Another line of work (Park et al., 2019; Niemeyer & Geiger, 2021; Schwarz et al., 2020; Jain et al., 2021) directly learns to generate implicit representations, e.g., neural radiance field (NeRF) (Mildenhall et al., 2020) , of shapes. However, for applications in physical simulation, the implicit representation needs to be converted into explicit representations such as meshes, which by itself is not a completely solved problem. In this work, we consider the problem of directly generating ready-to-use volumetric meshes. Volumetric mesh is one of the most important representations of 3D shapes, which is widely adopted in computer graphics and engineering (Nieser et al., 2011; Hang, 2015; Hu et al., 2018) . However, it is difficult to be generated with off-the-shelf generative models due to a number of geometric constraints (Aigerman & Lipman, 2013; Li et al., 2007; 2020; 2021; Ni et al., 2021) . Without carefully handling these constraints, the generated meshes have various defects, including flipping, self-intersection, large distortion, etc. To overcome the constraints, existing methods (Wang et al., 2018; Wen et al., 2019; Gupta & Chandraker, 2020; Shi et al., 2021) for deep mesh generation usually learn deformation on a template mesh, e.g., an ellipsoid mesh, to obtain a new mesh. Unfortunately, the usage of a template mesh limits the topology (number of holes) and large deformation of the generated meshes. Thus, we present a novel pipeline, termed Neural Volumetric Mesh Generator (NVMG), for learning generative models of volumetric meshes. Unlike focusing on designing the neural network on the mesh representation, NVMG takes a two-level hybrid approach. First, we utilize the generalization behavior of diffusion models (Ho et al., 2020) on a voxel-based representation to obtain an initial synthesis result. We then use another neural network to predict how to perturb the initial synthesis, adding geometric details. At the second level, NVMG employs an optimization procedure to control the quality of the final mesh, addressing issues in flipped faces and distorted faces. The optimization procedure seamlessly combines with the output of neural predictions, adding the strength of neural predictions and optimization-based formulations for mesh optimization. We empirically evaluate NVMG on unconditional volumetric mesh generation and conditional image-to-mesh generation tasks. We show that NVMG outperforms state-of-the-art point cloud generator (Zhou et al., 2021) and image-to-mesh generator (Shi et al., 2021) . When we compare our mesh deformation module with the state-of-the-art voxel-to-mesh methods and Neural Dual Contouring, we obtain better results on converting generated voxel to mesh. (Fan et al., 2017; Yang et al., 2019; Achlioptas et al., 2018; Luo & Hu, 2021b; a; Zhou et al., 2021) , improve multi-view reconstruction (Yao et al., 2018; Chen et al., 2019) , and create high-quality implicit surfaces (Mildenhall et al., 2020; Niemeyer & Geiger, 2021; Genova et al., 2020) . Different from this work, our work focuses on an even harder 3D representation, the volumetric mesh. Instead of generating the point position or signed distribution in the 3D space, we need to provide the vertices position and the connection structure for face relations for both the surface and interior. The complex relationship makes the generative model extremely hard to learn.

2. RELATED WORK

Mesh Generation with Deep Neural Networks Despite the generality of deep learning, how to incorporate deep neural networks efficiently in mesh generation remains an open problem. The main difficulty is that a valid mesh must satisfy a series of physical constraints. To deal with these constraints, the majority of previous works use a mesh template and train a neural network to deform the template to obtain the mesh of interest (Wang et al., 2018; Wen et al., 2019; Gupta & Chandraker, 2020; Shi et al., 2021; Kanazawa et al., 2018; Pan et al., 2019; Uy et al., 2020) . However, the generated meshes are limited by the pre-defined template in several aspects, e.g., topology and the magnitude of deformation. Another branch of work is to learn implicit field to reconstruct the mesh from the point clouds or voxel prior (Venkatesh et al., 2021; Chen et al., 2022; Shen et al., 2021b; Chen & Zhang, 2019; 2021; Gao et al., 2020; Remelli et al., 2020; Chen et al., 2020; 2021; Chibane et al., 2020b; Sitzmann et al., 2020; Williams et al., 2019) . These works rely on the Marching Cube or variant of the marching cube algorithm to convert the learned implicit representation to the surface. However, these methods mainly have three issues. First, most methods need to know the information of the point cloud on the surface for the reconstruction, which indicates that they are not suitable for generating novel shapes. Second, converting implicit surface to mesh is not robust on the shape topology and may cause part and hole losses when converting to mesh. Third, the implicit surface representation can not model the shape interior. PolyGen (Nash et al., 2020) uses sequential model to progressively generate vertex and faces one by one, however, this kind of auto regressive model using transformer structures is costly and exhibits only limited generalization behavior. GET3D (Gao et al., 2022) generates high-quality 3D textured meshes bridging recent advances in the differentiable surface modeling, differentiable rendering as well as 2D Generative Adversarial Networks.



Figure 1: A gallery of generated volumetric meshes. Both the surface and the interior of volumetric meshes generated by NVMG are artifact-free. 3D Shape Generation Generating 3D shapes is a key challenge in computer vision and computer graphics (Hartley & Zisserman, 2003; Moons et al., 2010). Traditional methods focus on reconstructing 3D shapes from multiple views (Hartley & Zisserman, 2003; Furukawa & Ponce, 2009) using geometric cues. With the power of deep learning, researchers have advanced the field significantly with data-driven methods. Through the prior knowledge captured by the deep neural networks, current algorithms can generate point clouds from a single image (Fan et al., 2017; Yang et al., 2019; Achlioptas et al., 2018; Luo & Hu, 2021b;a; Zhou et al., 2021), improve multi-view reconstruction (Yaoet al., 2018; Chen et al.,  2019), and create high-quality implicit surfaces(Mildenhall et al., 2020; Niemeyer & Geiger, 2021; Genova et al., 2020). Different from this work, our work focuses on an even harder 3D representation, the volumetric mesh. Instead of generating the point position or signed distribution in the 3D space, we need to provide the vertices position and the connection structure for face relations for both the surface and interior. The complex relationship makes the generative model extremely hard to learn.

