GENERAL NEURAL GAUGE FIELDS

Abstract

The recent advance of neural fields, such as neural radiance fields, has significantly pushed the boundary of scene representation learning. Aiming to boost the computation efficiency and rendering quality of 3D scenes, a popular line of research maps the 3D coordinate system to another measuring system, e.g., 2D manifolds and hash tables, for modeling neural fields. The conversion of coordinate systems can be typically dubbed as gauge transformation, which is usually a pre-defined mapping function, e.g., orthogonal projection or spatial hash function. This begs a question: can we directly learn a desired gauge transformation along with the neural field in an end-to-end manner? In this work, we extend this problem to a general paradigm with a taxonomy of discrete & continuous cases, and develop an end-to-end learning framework to jointly optimize the gauge transformation and neural fields. To counter the problem that the learning of gauge transformations can collapse easily, we derive a general regularization mechanism from the principle of information conservation during the gauge transformation. To circumvent the high computation cost in gauge learning with regularization, we directly derive an information-invariant gauge transformation which allows to preserve scene information inherently and yield superior performance.

1. INTRODUCTION

Representing 3D scenes with high efficiency and quality has been a long-standing target in computer vision and computer graphics research. Recently, the implicit representation of neural radiance fields (Mildenhall et al., 2021) has shown that a 3D scene can be modeled with neural networks, which achieves compelling visual quality and low memory footprint. However, it suffers from long training times. The explicit voxel-based methods (Yu et al., 2021a; Sun et al., 2022) emerged with faster convergence but higher memory requirements, due to the use of the 3D voxel grids as scene representation. To strike a good balance between computation efficiency and rendering quality, EG3D (Chan et al., 2022) proposed to project the 3D coordinate system to a tri-plane system. Along with this line of research, TensoRF (Chen et al., 2022) factorizes 3D space into compact low-rank tensors; Instant-NGP (Müller et al., 2022) models the 3D space with multi-resolution hash grids to enable remarkably fast convergence speed. These recent works (e.g., EG3D and Instant-NGP) share the same prevailing paradigm by converting the 3D coordinate of neural fields to another coordinate system. Particularly, a coordinate system of the scene (e.g., 3D coordinate and hash table) can be regarded as a kind of gauge, and the conversion between the coordinate systems can be referred to as gauge transformation (Moriyasu, 1983) . Notably, existing gauge transformations in neural fields are usually pre-defined functions (e.g., orthogonal mappings and spatial hash functions (Teschner et al., 2003) ), which are sub-optimal for modeling the neural fields as shown in Fig. 1 . In this end, a learnable gauge transformation is more favored as it can be optimized towards the final objective. This raises an essential question: how to learn the gauge transformations along with the neural fields. Some previous works explore a special case of this problem, e.g., NeuTex (Xiang et al., 2021) and NeP (Ma et al., 2022) aim to transform 3D points into continuous 2D manifolds. However, a general and unified learning paradigm for various gauge transformations has not been established or explored currently. In this work, we introduce general Neural Gauge Fields which unify various gauge transformations in neural fields with a special focus on how to learn the gauge transformations along with neural fields. Basically, a gauge is defined by gauge parameters and gauge basis, e.g., codebook indices Figure 1 : Conceptual illustration of a gauge transformation from 3D point coordinates to a 2D plane. Instead of naively employ a pre-defined orthogonal mapping which incurs overlap on the 2D plane, the proposed neural gauge fields aim to learn the mapping along with neural fields driven by multi-view synthesis loss. and codebook vectors, which can be continuous or discrete. Thus, the gauge transformations for neural fields can be duly classified into continuous cases (e.g., triplane space) and discrete cases (e.g., a hash codebook space). We then develop general learning paradigms for continuous cases and discrete cases, which map a 3D point to a continuous coordinate or a discrete index in the target gauge, respectively. As typical cases, we study continuous mapping from 3D space to the 2D plane and discrete mapping from 3D space to 256 discrete vectors. As shown in Fig. 2 and 3 , we observed that naively optimizing the gauge transformations with the neural fields severely suffers from gauge collapse, which means the gauge transformations will collapse to a small region in a continuous case or collapse to a small number of indices in a discrete case Baevski et al. ( 2019 2022) to avoid many-to-one mapping; a structural regularization is also adopted in Tretschk et al. (2021) to preserve local structure by only predicting a coordinate offset. However, the cycle consistency regularization tends to be heuristic without grounded derivation while the structural regularization is constrained to regularize continuous cases. In this work, we introduce a more intuitive Information Regularization (InfoReg) from the principle of information conservation during gauge transformation. By maximizing the mutual information between gauge parameters, we successfully derive general regularization forms for gauge transformations. Notably, a geometric uniform distribution and a discrete uniform distribution are assumed as the prior distribution of continuous and discrete gauge transformations, respectively, where an Earth Mover's distance and a KL divergence are duly applied to measure the distribution discrepancy for regularization. Learning the gauge transformation with regularization usually incurs high computation cost which is infeasible for some practical applications such as fast scene representation. In line with relative information conservation, we directly derive an Information-Invariant (InfoInv) gauge transformation which allows to preserve scene information inherently in gauge transformations and obviate the need for regularizations. Particularly, the derived InfoInv coincides with the form of position encoding adopted in NeRF (Mildenhall et al., 2021) , which provides certain rationale for the effectiveness of position encoding in neural fields. The contributions of this work are threefold. First, we develop a general framework of neural gauge fields which unifies various gauge transformations in neural fields, and gives general learning forms for continuous and discrete gauge transformations. Second, we strictly derive a regularization mechanism for gauge transformation learning from the perspective of information conservation during gauge transformation that outperform earlier heuristic approaches. Third, we present an information nested function to preserve scene information inherently during gauge transformations.

2. RELATED WORK

Recent work has demonstrated the potential of neural radiance fields (Mildenhall et al., 2021) and its extensions for multifarious vision and graphics applications, including fast view synthesis (Liu et al., 2020; Yu et al., 2021b; Hedman et al., 2021; Lindell et al., 2021; Neff et al., 2021; Yu et al., 2021a; Reiser et al., 2021; Sun et al., 2022) , generative models (Schwarz et al., 2020; Niemeyer & Geiger, 2021; Gu et al., 2022; Chan et al., 2021; Or-El et al., 2022) , surface reconstruction (Wang et al., 2021; Oechsle et al., 2021; Yariv et al., 2021) , etc. Under this context, various gauge transformations have



); Kaiser et al. (2018). To regularize the learning of gauge transformation, a cycle consistency loss has been explored in Xiang et al. (2021); Ma et al. (

