LEARNING TO GENERATE 3D SHAPES WITH GENERATIVE CELLULAR AUTOMATA

Abstract

We present a probabilistic 3D generative model, named Generative Cellular Automata, which is able to produce diverse and high quality shapes. We formulate the shape generation process as sampling from the transition kernel of a Markov chain, where the sampling chain eventually evolves to the full shape of the learned distribution. The transition kernel employs the local update rules of cellular automata, effectively reducing the search space in a high-resolution 3D grid space by exploiting the connectivity and sparsity of 3D shapes. Our progressive generation only focuses on the sparse set of occupied voxels and their neighborhood, thus enabling the utilization of an expressive sparse convolutional network. We propose an effective training scheme to obtain the local homogeneous rule of generative cellular automata with sequences that are slightly different from the sampling chain but converge to the full shapes in the training data. Extensive experiments on probabilistic shape completion and shape generation demonstrate that our method achieves competitive performance against recent methods.

1. INTRODUCTION

Probabilistic 3D shape generation aims to learn and sample from the distribution of diverse 3D shapes and has applications including 3D contents generation or robot interaction. Specifically, learning the distribution of shapes or scenes can automate the process of generating diverse and realistic virtual environments or new object designs. Likewise, modeling the conditional distribution of the whole scene given partial raw 3D scans can help the decision process of a robot, by informing various possible outputs of occluded space. The distribution of plausible shapes in 3D space is diverse and complex, and we seek a scalable formulation of the shape generation process. Pioneering works on 3D shape generation try to regress the entire shape (Dai et al. (2017) ) which often fail to recover fine details. We propose a more modular approach that progressively generates shape by a sequence of local updates. Our work takes inspiration from prior works on autoregressive models in the image domains, such as the variants of pixelCNN (van den Oord & Kalchbrenner (2016); van den Oord et al. (2016; 2017) ), which have been successful in image generation. The key idea of pixelCNN (van den Oord et al. ( 2016)) is to order the pixels, and then learn the conditional distribution of the next pixel given all of the previous pixels. Thus generating an image becomes the task of sampling pixel-by-pixel in the predefined order. Recently, PointGrow (Sun et al. (2020) ) proposes a similar approach in the field of 3D generation, replacing the RGB values of pixels with the coordinates of points and sampling point-by-point in a sequential manner. While the work proposes a promising interpretable generation process by sequentially growing a shape, the required number of sampling procedures expands linearly with the number of points, making the model hard to scale to high-resolution data. We believe that a more scalable solution in 3D is to employ the local update rules of cellular automata (CA). CA, a mathematical model operating on a grid, defines a state to be a collection of cells that carries values in the grid (Wolfram (1982) ). The CA repeatedly mutates its states based on the predefined homogeneous update rules only determined by the spatial neighborhood of the current cell. In contrast to the conventional CA where the rules are predefined, we employ a neural network to infer the stochastic sequential transition rule of individual cells based on Markov chain. The obtained homogeneous local rule for the individual cells constitutes the 3D generative model, named Generative Cellular Automata (GCA). When the rule is distributed into the group of occupied cells of an arbitrary starting shape, the sequence of local transitions eventually evolves into an instance among the diverse shapes from the multi-modal distribution. The local update rules of CA greatly reduce the search space of voxel occupancy, exploiting the sparsity and connectivity of 3D shapes. We suggest a simple, progressive training procedure to learn the distribution of local transitions of which repeated application generates the shape of the data distribution. We represent the shape in terms of surface points and store it within a 3D grid, and the transition rule is trained only on the occupied cells by employing a sparse CNN (Graham et al. ( 2018 The contributions of the paper are highlighted as follows: (1) We propose Generative Cellular Automata (GCA), a Markov chain based 3D generative model that iteratively mends the shape to a learned distribution, generating diverse and high-fidelity shapes. (2) Our work is the first to learn the local update rules of cellular automata for 3D shape generation in voxel representation. This enables the use of an expressive sparse CNN and reduces the search space of voxel occupancy by fully exploiting sparsity and connectivity of 3D shapes. (3) Extensive experiments show that our method has competitive performance against the state-of-the-art models in probabilistic shape completion and shape generation.

2. 3D SHAPE GENERATION WITH GENERATIVE CELLULAR AUTOMATA

Let Z n be an n-dimensional uniform grid space, where n = 3 for a 3D voxel space. A 3D shape is represented as a state s ⊂ Z 3 , which is an ordered set of occupied cells c ∈ Z 3 in a binary occupancy grid based on the location of the surface. Note that our voxel representation is different from the conventional occupancy grid, where 1 represents that the cell is inside the surface. Instead, we only store the cells lying on the surface. This representation can better exploit the sparsity of 3D shape than the full occupancy grid. The shape generation process is presented as a sequence of state variables s 0:T that is drawn from the following Markov Chain: s 0 ∼ p 0 s t+1 ∼ p θ (s t+1 |s t ) (1) where p 0 is the initial distribution and p θ is the homogeneous transition kernel parameterized by θ. We denote the sampled sequence s 0 → s 1 → • • • → s T as a sampling chain. Given the data set D containing 3D shapes x ∈ D, our objective is to learn the parameters θ of transition kernel p θ , such that the marginal distribution of final generated sample p(s T ) = s 0:T -1 p 0 (s 0 ) 0≤t<T p θ (s t+1 |s t )



Figure 1: Sampling chain of shape generation. Full generation processes are included in the supplementary material.

)). The sparse representation can capture the high-resolution context information, and yet learn the effective rule enjoying the expressive power of deep CNN as demonstrated in various computer vision tasks (Krizhevsky et al. (2012); He et al. (2017)). Inspired by Bordes et al. (2017), our model learns sequences that are slightly different from the sampling chain but converge to the full shapes in the training data. The network successfully learns the update rules of CA, such that a single inference samples from the distribution of diverse modes along the surface.

