LEARNING MANIFOLD PATCH-BASED REPRESENTATIONS OF MAN-MADE SHAPES

Abstract

Choosing the right representation for geometry is crucial for making 3D models compatible with existing applications. Focusing on piecewise-smooth man-made shapes, we propose a new representation that is usable in conventional CAD modeling pipelines and can also be learned by deep neural networks. We demonstrate its benefits by applying it to the task of sketch-based modeling. Given a raster image, our system infers a set of parametric surfaces that realize the input in 3D. To capture piecewise smooth geometry, we learn a special shape representation: a deformable parametric template composed of Coons patches. Naïvely training such a system, however, is hampered by non-manifold artifacts in the parametric shapes and by a lack of data. To address this, we introduce loss functions that bias the network to output non-self-intersecting shapes and implement them as part of a fully self-supervised system, automatically generating both shape templates and synthetic training data. We develop a testbed for sketch-based modeling, demonstrate shape interpolation, and provide comparison to related work.

1. INTRODUCTION

While state-of-the art deep learning systems that output 3D geometry as point clouds, triangle meshes, voxel grids, and implicit surfaces yield detailed results, these representations are dense, highdimensional, and incompatible with CAD modeling pipelines. In this work, we develop a 3D representation that is parsimonious, geometrically interpretable, and easily editable with standard tools, while being compatible with deep learning. This enables a shape modeling system leveraging the ability of neural networks to process incomplete, ambiguous input and produces useful, consistent 3D output. Our primary technical contributions involve the development of machinery for learning parametric 3D surfaces in a fashion that is efficiently compatible with modern deep learning pipelines and effective for a challenging 3D modeling task. We automatically infer a template per shape category and incorporate loss functions that operate explicitly on the geometry rather than in the parametric domain or on a sampling of surrounding space. Extending learning methodologies from images and point sets to more exotic modalities like networks of surface patches is a central theme of modern graphics, vision, and learning research, and we anticipate broad application of these developments in CAD workflows. To test our system, we choose sketch-based modeling as a target application. Converting rough, incomplete 2D input into a clean, complete 3D shape is extremely ill-posed, requiring hallucination of missing parts and interpretation of noisy signal. To cope with these ambiguities, existing systems either rely on hand-designed priors, severely limiting applications, or learn the shapes from data, implicitly inferring relevant priors (Delanoy et al., 2018; Wang et al., 2018a; Lun et al., 2017) . However, the output of the latter methods often lacks resolution and sharp features necessary for high-quality 3D modeling. In industrial design, man-made shapes are typically modeled as collections of smooth parametric patches (e.g., NURBS surfaces) whose boundaries form the sharp features. To learn such shapes effectively, we use a deformable parametric template (Jain et al., 1998 )-a manifold surface composed of patches, each parameterized by control points (Fig. 3a ). This representation enables the model to control the smoothness of each patch and introduce sharp edges between patches where necessary. Training a model for such representations faces three major challenges: detection of non-manifold surfaces, structural variation within shape categories, and lack of data. We address them as follows: • We introduce several loss functions that encourage our patch-based output to form a manifold mesh without topological artifacts or self-intersections. • Some categories of man-made shapes exhibit structural variation. To address this, for each category we algorithmically generate a varying deformable template, which allows us to support separate structural variation using a variable number of parts ( §3.1). • Supervised methods mapping from sketches to 3D require a database of sketch-model pairs, and, to-date, there are no such large-scale repositories. We use a synthetic sketch augmentation pipeline inspired by artistic literature to simulate variations observed in natural drawings ( §4.1). Although our model is trained on synthetic sketches, it generalizes to natural sketches. Our method is self-supervised: We predict patch parameters, but our data is not labeled with patches.

2.1. DEEP LEARNING FOR SHAPE RECONSTRUCTION

Learning to reconstruct 3D geometry has recently enjoyed significant research interest. Typical forms of input are images (Gao et al., 2019; Wu et al., 2017; Delanoy et al., 2018; Häne et al., 2019) and point clouds (Williams et al., 2019; Groueix et al., 2018; Park et al., 2019) . When designing a network for this task, two considerations affect the architecture: the loss function and the geometric representation. Loss Functions. One popular direction employs a differentiable renderer and measures 2D image loss between a rendering of the inferred 3D model and the input image (Kato et al., 2018; Wu et al., 2016; Yan et al., 2016; Rezende et al., 2016; Wu et al., 2017; Tulsiani et al., 2017b; 2018) . A notable example is the work by Wu et al. (2017) , which learns a mapping from a photograph to a normal map, a depth map, a silhouette, and the mapping from these outputs to a voxelization. They use a differentiable renderer and measure inconsistencies in 2D. Hand-drawn sketches, however, cannot be interpreted as perfect projections of 3D objects: They are imprecise and often inconsistent (Bessmeltsev et al., 2016) . Another approach uses 3D loss functions, measuring discrepancies between the predicted and target 3D shapes directly, often via Chamfer or a regularized Wasserstein distance (Williams et al., 2019; Liu et al., 2010; Mandikal et al., 2018; Groueix et al., 2018; Park et al., 2019; Gao et al., 2019; Häne et al., 2019) . We build on this work, adapting Chamfer distance to patch-based geometric representations and extending the loss function with new regularizers ( §3.2). et al., 2018; Wu et al., 2017; Zhang et al., 2018; Wu et al., 2018) yield dense reconstruction that are limited in resolution, offer no topological guarantees, and cannot represent sharp features. Point-based approaches represent geometry as a point cloud (Yin et al., 2018; Mandikal et al., 2018; Fan et al., 2017; Lun et al., 2017; Yang et al., 2018) , sidestepping memory issues, but do not capture manifold connectivity.



Figure1: Editing a 3D model produced by our method. Because we output 3D geometry as a collection of consistent, well-placed NURBS patches, user edits can be made in conventional CAD software by simply moving control points. Here, we are able to refine the trunk of a car model with just a few clicks.

Shape representation. As noted by Park et al. (2019), geometric representations in deep learning broadly can be divided into three classes: voxel-, point-, and mesh-based. Voxel-based methods (Delanoy

