LEARNING TO GENERATE 3D SHAPES WITH GENERATIVE CELLULAR AUTOMATA

Abstract

We present a probabilistic 3D generative model, named Generative Cellular Automata, which is able to produce diverse and high quality shapes. We formulate the shape generation process as sampling from the transition kernel of a Markov chain, where the sampling chain eventually evolves to the full shape of the learned distribution. The transition kernel employs the local update rules of cellular automata, effectively reducing the search space in a high-resolution 3D grid space by exploiting the connectivity and sparsity of 3D shapes. Our progressive generation only focuses on the sparse set of occupied voxels and their neighborhood, thus enabling the utilization of an expressive sparse convolutional network. We propose an effective training scheme to obtain the local homogeneous rule of generative cellular automata with sequences that are slightly different from the sampling chain but converge to the full shapes in the training data. Extensive experiments on probabilistic shape completion and shape generation demonstrate that our method achieves competitive performance against recent methods.

1. INTRODUCTION

Probabilistic 3D shape generation aims to learn and sample from the distribution of diverse 3D shapes and has applications including 3D contents generation or robot interaction. Specifically, learning the distribution of shapes or scenes can automate the process of generating diverse and realistic virtual environments or new object designs. Likewise, modeling the conditional distribution of the whole scene given partial raw 3D scans can help the decision process of a robot, by informing various possible outputs of occluded space. The distribution of plausible shapes in 3D space is diverse and complex, and we seek a scalable formulation of the shape generation process. Pioneering works on 3D shape generation try to regress the entire shape (Dai et al. ( 2017)) which often fail to recover fine details. We propose a more modular approach that progressively generates shape by a sequence of local updates. Our work takes inspiration from prior works on autoregressive models in the image domains, such as the variants of pixelCNN (van den Oord & Kalchbrenner ( 2016 2016)) is to order the pixels, and then learn the conditional distribution of the next pixel given all of the previous pixels. Thus generating an image becomes the task of sampling pixel-by-pixel in the predefined order. Recently, PointGrow (Sun et al. ( 2020)) proposes a similar approach in the field of 3D generation, replacing the RGB values of pixels with the coordinates of points and sampling point-by-point in a sequential manner. While the work proposes a promising interpretable generation process by sequentially growing a shape, the required number of sampling procedures expands linearly with the number of points, making the model hard to scale to high-resolution data. We believe that a more scalable solution in 3D is to employ the local update rules of cellular automata (CA). CA, a mathematical model operating on a grid, defines a state to be a collection of cells that carries values in the grid (Wolfram (1982) ). The CA repeatedly mutates its states based on the predefined homogeneous update rules only determined by the spatial neighborhood of the current cell. In contrast to the conventional CA where the rules are predefined, we employ a neural network to infer the stochastic sequential transition rule of individual cells based on Markov chain. The obtained homogeneous local rule for the individual cells constitutes the 3D generative model, named



); van den Oord et al. (2016; 2017)), which have been successful in image generation. The key idea of pixelCNN (van den Oord et al. (

