GENERAL NEURAL GAUGE FIELDS

Abstract

The recent advance of neural fields, such as neural radiance fields, has significantly pushed the boundary of scene representation learning. Aiming to boost the computation efficiency and rendering quality of 3D scenes, a popular line of research maps the 3D coordinate system to another measuring system, e.g., 2D manifolds and hash tables, for modeling neural fields. The conversion of coordinate systems can be typically dubbed as gauge transformation, which is usually a pre-defined mapping function, e.g., orthogonal projection or spatial hash function. This begs a question: can we directly learn a desired gauge transformation along with the neural field in an end-to-end manner? In this work, we extend this problem to a general paradigm with a taxonomy of discrete & continuous cases, and develop an end-to-end learning framework to jointly optimize the gauge transformation and neural fields. To counter the problem that the learning of gauge transformations can collapse easily, we derive a general regularization mechanism from the principle of information conservation during the gauge transformation. To circumvent the high computation cost in gauge learning with regularization, we directly derive an information-invariant gauge transformation which allows to preserve scene information inherently and yield superior performance.

1. INTRODUCTION

Representing 3D scenes with high efficiency and quality has been a long-standing target in computer vision and computer graphics research. Recently, the implicit representation of neural radiance fields (Mildenhall et al., 2021) has shown that a 3D scene can be modeled with neural networks, which achieves compelling visual quality and low memory footprint. However, it suffers from long training times. The explicit voxel-based methods (Yu et al., 2021a; Sun et al., 2022) emerged with faster convergence but higher memory requirements, due to the use of the 3D voxel grids as scene representation. To strike a good balance between computation efficiency and rendering quality, EG3D (Chan et al., 2022) proposed to project the 3D coordinate system to a tri-plane system. Along with this line of research, TensoRF (Chen et al., 2022) factorizes 3D space into compact low-rank tensors; Instant-NGP (Müller et al., 2022) models the 3D space with multi-resolution hash grids to enable remarkably fast convergence speed. These recent works (e.g., EG3D and Instant-NGP) share the same prevailing paradigm by converting the 3D coordinate of neural fields to another coordinate system. Particularly, a coordinate system of the scene (e.g., 3D coordinate and hash table) can be regarded as a kind of gauge, and the conversion between the coordinate systems can be referred to as gauge transformation (Moriyasu, 1983) . Notably, existing gauge transformations in neural fields are usually pre-defined functions (e.g., orthogonal mappings and spatial hash functions (Teschner et al., 2003) ), which are sub-optimal for modeling the neural fields as shown in Fig. 1 . In this end, a learnable gauge transformation is more favored as it can be optimized towards the final objective. This raises an essential question: how to learn the gauge transformations along with the neural fields. Some previous works explore a special case of this problem, e.g., NeuTex (Xiang et al., 2021) and NeP (Ma et al., 2022) aim to transform 3D points into continuous 2D manifolds. However, a general and unified learning paradigm for various gauge transformations has not been established or explored currently. In this work, we introduce general Neural Gauge Fields which unify various gauge transformations in neural fields with a special focus on how to learn the gauge transformations along with neural fields. Basically, a gauge is defined by gauge parameters and gauge basis, e.g., codebook indices

