ON THE EFFECTIVENESS OF WEIGHT-ENCODED NEURAL IMPLICIT 3D SHAPES Anonymous

Abstract

A neural implicit outputs a number indicating whether the given query point in space is inside, outside, or on a surface. Many prior works have focused on latentencoded neural implicits, where a latent vector encoding of a specific shape is also fed as input. While affording latent-space interpolation, this comes at the cost of reconstruction accuracy for any single shape. Training a specific network for each 3D shape, a weight-encoded neural implicit may forgo the latent vector and focus reconstruction accuracy on the details of a single shape. While previously considered as an intermediary representation for 3D scanning tasks or as a toy-problem leading up to latent-encoding tasks, weight-encoded neural implicits have not yet been taken seriously as a 3D shape representation. In this paper, we establish that weight-encoded neural implicits meet the criteria of a first-class 3D shape representation. We introduce a suite of technical contributions to improve reconstruction accuracy, convergence, and robustness when learning the signed distance field induced by a polygonal mesh -the de facto standard representation. Viewed as a lossy compression, our conversion outperforms standard techniques from geometry processing. Compared to previous latent-and weight-encoded neural implicits we demonstrate superior robustness, scalability, and performance.

1. INTRODUCTION

While 3D surface representation has been a foundational topic of study in the computer graphics community for over four decades, recent developments in machine learning have highlighted the potential that neural networks can play as effective parameterizations of solid shapes. The success of neural approaches to shape representations has been evidenced both through their ability of representing complex geometries as well as their utility in end-to-end 3D shape learning, reconstruction, and understanding and tasks. These approaches also make use of the growing availability of user generated 3D content and high-fidelity 3D capture devices, e.g., point cloud scanners. For these 3D tasks, one powerful configuration is to represent a 3D surface S as the set containing any point x ∈ R 3 for which an implicit function (i.e., a neural network) evaluates to zero: S := x ∈ R 3 |f θ ( x; z) = 0 , ( ) Implicit Explicit (mesh) where θ ∈ R m are the network weights and z ∈ R k is an input latent vector encoding a particular shape. In contrast to the de facto standard polygonal mesh representation which explicitly discretizes a surface's geometry, the function f implicitly defines the shape S encoded in z. We refer to the representation in Eq. ( 1) as a latent-encoded neural implicit. et al. (2019) propose to optimize the weights θ so each shape S i ∈ D in a dataset or shape distribution D is encoded into a corresponding latent vector z i . If successfully trained, the weights θ of their DEEPSDF implicit function f θ can be said to generalize across the "shape space" of D. As always with supervision, reducing the training set from D will affect f 's ability to generalize and can lead to overfitting. Doing so may seem, at first, to be an ill-fated and uninteresting idea.

Park

Our work considers an extreme case -when the training set is reduced to a single shape S i . We can draw a simple but powerful conclusion: in this setting, one can completely forgo the latent vector (i.e., k = 0). From the perspective of learning the shape space of D, we can "purposefully overfit" a network to a single shape S i : S i := x ∈ R 3 |f θi (x) = 0 , where θ i now parameterizes a weight-encoded neural implicit for the single shape S i . In the pursuit of learning the "space of shapes," representing a single shape as a weight-encoded neural implicit has been discarded as a basic validation check or stepping stone toward the ultimate goal of generalizing over many shapes (see, e.g., (Chen & Zhang, 2019; Park et al., 2019; Atzmon & Lipman, 2020a; b) ). Weight-encoded neural implicits, while not novel, have been overlooked as a valuable shape representation beyond learning and computer vision tasks. For example, the original DEEPSDF work briefly considered -and nearly immediately discards -the idea of independently encoding each shape of a large collection: "Training a specific neural network for each shape is neither feasible nor very useful." -Park et al. ( 2019) We propose training a specific neural network for each shape and will show that this approach is both feasible and very useful. I II III IV Point cloud × × • ×/• Mesh • × • × Regular grid • • × • Adaptive grid • • • × Neural implicit • • • • We establish that a weight-encoded neural implicit meets the criteria of a first-class representation for 3D shapes ready for direct use in graphics and geometry processing pipelines (see inset table) While common solid shape representations have some important features and miss others, neural implicits provide a new and rich constellation of features. Unstructured point clouds are often raw output from 3D scanners, but do not admit straightforward smooth surface visualization (I). While meshes are the de facto standard representation, conducting signed distance queries and CSG operations remain non-trivial (II). Signed distances or occupancies stored on a regular grid admit fast spatial queries and are vectorizeable just like 2D images, but they wastefully sample space uniformly rather than compactly adapt their storage budget to a particular shape (III). Adaptive or sparse grids are more economical, but, just as meshes will have a different number of vertices and faces, adaptive grids will different storage profiles and access paths precluding consistent data vectorization (IV). While previous methods have explored weight-encoded neural implicits as an intermediary representation for scene reconstruction (e.g., (Mildenhall et al., 2020) ) and noisy point-cloud surfacing tasks (e.g., (Atzmon & Lipman, 2020a;b)), we consider neural implicits as the primary geometric representation. Beyond this observational contribution, our technical contributions include a proposed architecture and training regime for converting the (current) most widely-adopted 3D geometry format -polygonal meshes -into a weight-encoded neural implicit representation. We report on experimentsfoot_0 with different architectures, sampling techniques, and activation functions -including positional encoding (Mildenhall et al., 2020) and sinusoidal activation approaches (Sitzmann et al., 2020b) that have proven powerful in the context of neural implicits. Compared to existing training regimes, we benefit from memory improvements (directly impacting visualization performance), stability to perturbed input data, and scalability to large datasets. Weight-encoded neural implicits can be treated as an efficient, lossy compression for 3D shapes. Increasing the size of the network increases the 3D surface accuracy (see Figure 1 ) and, compared to standard graphics solutions for reducing complexity (mesh decimation and storing signed distances on a regular grid), we achieve higher accuracy for the same memory footprint as well as maintaining a SIMD representation: n shapes can be represented as n weight-vectors for a fixed architecture. The benefits of converting an existing mesh to a neural implicit extends beyond compression: in approximating the signed distance field (SDF) of the model, neural implicits are both directly usable for many tasks in graphics and geometry processing, and preferable in many contexts compared to traditional representations. Many downstream uses of 3D shapes already mandate the conversion of meshes to less accurate grid-based SDFs, due to the ease and efficiency of computation for SDFs: here, neural implicits serve as a drop-in replacement.



Source code, data, and demo at our (anonymized) repo: https://github.com/u2ni/ICLR2021

