SEQUENTIAL BRICK ASSEMBLY WITH EFFICIENT CONSTRAINT SATISFACTION

Abstract

We tackle the problem of assembling LEGO bricks, which can be considered as an instance of combinatorial optimization problems. Such a problem is challenging, since the number of possible structures increases exponentially with the number of available bricks due to complex physical constraints between bricks. To solve this problem, our method assesses a brick structure to predict the next brick position and its confidence by employing a U-shaped sparse 3D convolutional network. A convolution filter efficiently validates the physical constraints in a parallelizable and scalable manner and effectively allows us to process different brick types. To generate a novel structure, we devise a sampling strategy to determine the next brick position by considering attachable positions under the physical constraints. Instead of using handcrafted brick assembly datasets, our model is trained with a large number of 3D objects that enable to create a new high-fidelity structure. We demonstrate that our method successfully generates diverse brick structures while handling two different brick types and outperforms existing methods based on Bayesian optimization, deep graph generative model, and reinforcement learning, all of which consider assembling a single brick type. To show the validity of our method, elaborate studies on various assembly scenarios are also presented.

1. INTRODUCTION

Most real-world 3D structures are constructed with smaller primitives. A broad range of studies have tackled an interesting assembly problem such as molecule generation (Ertl et al., 2017; Neil et al., 2018; You et al., 2018) , building construction (Talton et al., 2011; Martinovic & Van Gool, 2013; Ritchie et al., 2015) , and part assembly generation (Sung et al., 2017; Lee et al., 2021; Jones et al., 2021) . However, if a primitive unit (e.g., LEGO bricks) is used to construct a 3D structure, it is thought of as an instance of combinatorial optimization problems. In particular, the search space of the primitive assembly problem increases exponentially as a search depth increases: given n primitives with k possible combinations, the search space increases as O(n k ). In addition, supposing that a constraint between unit primitives should be satisfied, the problem becomes more challenging. In this work, we tackle the problem of assembling LEGO bricks into a 3D structure. The generation problem of brick assembly is defined as a sequential decision-making process that sequentially appends a new brick by determining the brick type, position, and direction of the new brick. Moreover, a brick to assemble has to satisfy the physical constraints related to the disallowance of overlap, no isolated bricks, and LEGO-like brick connections; see Section 2 for the details of such constraints. Compared to general 3D generation methods such as previous work (Wu et al., 2016; Gadelha et al., 2018; Achlioptas et al., 2018) , the advantages of sequential brick assembly as a generation method are that it can generate a structure in an open space and provide instructions to build the structure. Several attempts have been conducted on the generation of sequential brick assembly by utilizing Bayesian optimization (Kim et al., 2020) , deep graph generative model (Thompson et al., 2020) , and reinforcement learning (Chung et al., 2021) . Those methods are capable of assembling 2 × 4 LEGO bricks with consideration of the physical constraints. However, the previous literature has several limitations. A Bayesian optimization method proposed by Kim et al. (2020) requires heavy computations due to their iterative optimization process for each brick position. Thompson et al. (2020) propose to use masks to filter out invalid actions along with their method, but the utilization of masks degrades assembly performance. To predict a valid action, Chung et al. (2021) propose Table 1 : Comparisons to the previous approaches and our method in terms of unit primitives, algorithms, target conditioning, and constraint satisfaction. SA stands for sequential assembly. Note that our approach does not provide guidance (images or target shapes) when generating a new shape. to employ a neural network, which fails to predict proper moves perfectly. More importantly, these methods share common limitations: they only consider using a single brick type (i.e., 2×4 brick) due to their inherent design choices or technical difficulties, and inevitably become slower in validating the constraints as the number of bricks increases due to exponentially increasing search spaces.

Method

To tackle the limitations above, we devise a novel brick assembly method with a U-shaped sparse 3D convolutional neural network utilizing a convolution filter to validate complex brick constraints efficiently. By a sampling procedure proposed in this work, our method can create diverse assembly sequences of structures in a training stage, which makes us create training episodes. Also, we use the sampling procedure to generate high-fidelity brick structures. In addition, a convolution filter to validate the physical constraints enables our method to be easily parallelizable and scalable to processing different brick types. Henceforth, we refer to our method as sequential Brick assembly with Efficient Constraint Satisfaction (BrECS). We carry out two scenarios of sequential brick assembly: completion and generation tasks. A completion task is to assemble bricks from a given partial structure, and a generation task is to create a brick structure from scratch; see Section 5 for their details. In the completion task, BrECS achieves 30.5% higher IoU scores than the best performing baseline on average (as reported in Table 2 ), while ours is eight times faster than the baseline. Also, in the generation task, BrECS performs the best with 117% higher average classification scores than the next best one (as reported in Table 3 ). In summary, the contributions of this work are as follows: • We propose a novel generation model for sequential brick assembly, which validates physical constraints with a convolution filter in parallel and generates high-fidelity 3D structures; • We propose a U-shaped 3D sparse convolutional network that can be trained with a voxel dataset, which enables us to assemble high-fidelity structures by exploiting a popular 3D dataset, i.e., ModelNet40 (Wu et al., 2015) ; • We show that our model successfully assembles two different brick types, i.e., 2×4 and 2× 2 LEGO bricks, in various experiments. We will release our codes and datasets upon publication. 



Wu et al. (2016)  propose to use a generative adversarial network to generate voxel occupancy. A method, which is suggested byChoy et al. (2019), generates 3D shapes on voxel grids with a fully convolutional neural network that includes efficient sparse convolution.Zhang et al.  (2021)  propose a method that generates high-quality voxel-based shapes by iteratively applying 3D convolutional neural networks. A point cloud means that points on the cloud are continuous, borderless, and (possibly) infinite. In contrast, voxels are similar to pixels -coordinates are discretized and limited. Our method employs a voxel-based generation model to generate brick-wise scores since candidates for brick positions are discrete, and the number of bricks is finite.

