TT-NF: TENSOR TRAIN NEURAL FIELDS

Abstract

Learning neural fields has been an active topic in deep learning research, focusing, among other issues, on finding more compact and easy-to-fit representations. In this paper, we introduce a novel low-rank representation termed Tensor Train Neural Fields (TT-NF) for learning neural fields on dense regular grids and efficient methods for sampling from them. Our representation is a TT parameterization of the neural field, trained with backpropagation to minimize a non-convex objective. We analyze the effect of low-rank compression on the downstream task quality metrics in two settings. First, we demonstrate the efficiency of our method in a sandbox task of tensor denoising, which admits comparison with SVD-based schemes designed to minimize reconstruction error. Furthermore, we apply the proposed approach to Neural Radiance Fields, where the low-rank structure of the field corresponding to the best quality can be discovered only through learning.

1. INTRODUCTION

Following the growing interest in deep neural networks, learning neural fields has become a promising research direction in areas concerned with structured representations. However, precision is usually at odds with the computational complexity of these representations, which makes training them and sampling from them a challenge. In this paper, we investigate interpretable low-rank neural fields defined on dense regular grids and efficient methods for learning them. Since, in extreme cases, the dimensionality of such fields can exceed the memory size of a typical computer by several orders of magnitude, we look at the problem of learning such fields from the angle of stochastic methods. Tensor decompositions have become a ubiquitous tool for dealing with structured sparsity of intractable volumes of data. Within the large family of tensor decompositions, we focus on the Tensor Train (TT) (Oseledets, 2011) , also known as the Matrix Product State in physics. TT is notable for its high-capacity representation, efficient algebraic operations in the low-rank space, and support of SVD-based algorithms for data approximation. As such, we consider TT-SVD (Oseledets, 2011) and TT-cross (Oseledets & Tyrtyshnikov, 2010) methods for obtaining a low-rank representation of the full tensor. While TT-SVD requires access to the full tensor at once (which might already be problematic in specific scenarios), TT-cross requires access to data through a black-box function, computing (or looking up) elements by their coordinates on demand. Both methods operate under the assumption of noise-free data and are not guaranteed to output sufficiently good approximations in the presence of noise. While noise in observations is challenging for SVD-based schemes and requires devising tailored approaches to different noise types and magnitude (Zhou et al., 2022) , exploiting the low-rank structure of the field driven by data is even more challenging (Novikov et al., 2014; Boyko et al., 2020) and typically resorts to the paradigm of data updates through algebraic operations on TT. In this work, we take a step back and leverage the modern deep learning paradigm to parameterize neural fields as TT, coined TT-NF. Through deep learning tooling with support for automatic differentiation and our novel sampling methods, we obtain mini-batches of samples from the parameterized neural field and perform optimization of a non-convex objective defined by a downstream task. The optimization comprises the computation of parameter gradients with backpropagation and parameter updates with any suitable technique, such as SGD. We analyze TT-NF and several sampling techniques on a range of problem sizes and provide reference charts for choosing a sampling method based on memory and computational constraints. Next, we define a synthetic task of low-rank tensor denoising and demonstrate the superiority of the proposed optimization scheme over several SVD-based schemes. Finally, we consider the formulation of Neural Radiance Fields (NeRF) introduced in Mildenhall et al. ( 2020), and propose a simple modification to TT-NF, termed QTT-NF, for dealing with hierarchical spaces. Our contributions in this paper: 1. TT-NF -compressed low-rank neural field representation defined on a dense grid; 2. QTT-NF -a modification of TT-NF for learning neural fields defined on hierarchical spaces, such as 3D voxel grids seen in neural rendering; 3. Efficient algorithms for sampling from (Q)TT-NF and learning it from samples, designed for deep learning tooling. The rest of the paper is organized as follows: Sec. 2 discusses the related work; Sec. 3 introduces notations from the relevant domains; Sec. 4 presents the proposed contributions; Sec. 5 demonstrates the practical use of the proposed methods; Sec. 6 concludes the paper. Many relevant details pertaining to our method, experiments, and extra discussion can be found in Appendix sections A, B, and C.

2. RELATED WORK

Tensor Decompositions Higher-order tensor decompositions have been found helpful for several data-based problems, as detailed by Kolda & Bader (2009) . Oseledets (2011) introduced the Tensor Train (TT) decomposition, which offers a compressed low-rank tensor approximation that is stable and fast. The TT decomposition has also been used to approximate tensors with linear complexity in their dimensionality via the TT-cross approximation (Oseledets & Tyrtyshnikov, 2010) . With the rise of deep learning, tensor-based methods have been integrated into neural networks, e.g., Usvyatsov et al. ( 2021) explored the use of TT-cross approximation for gradient selection in learning representations. We review tensor-based methods for network compression in the next paragraph and refer the reader to Panagakis et al. ( 2021) for a detailed overview of similar works. On the software side, along with general deep learning frameworks (Paszke et al., 2019; Abadi et al., 2015) , several tensor-centric frameworks have emerged (Kossaifi et al., 2019b; Usvyatsov et al., 2022; Novikov et al., 2020) . 



Low-rank bases were utilized byJaderberg et al.  (2014)  to approximate convolutional filters and drastically speed up inference via separating filter depth from spatial dimensions.Lebedev et al. (2014)  applied a low-rank decomposition on all 4 dimensions of the standard convolutional kernel tensors. Subsequent works employed more general tensor decompositions, notably the TT decomposition, to massively compress fully connected layers(Novikov et al., 2015)  or both fully connected and convolutional layers(Garipov et al., 2016), with minor accuracy losses. Kossaifi et al. (2019a) applied the higher-order tensor factorization to the entire network instead of separately to individual layers. In a similar vein, Li et al. (2019); Obukhov et al. (2020); Kanakis et al. (2020) propose to learn a basis and coefficients of each layer, thus enabling disentangled compression and multitask learning. While most of the aforementioned methods examine general convolutional networks, we focus specifically on compressing neural fields. Ripoll et al. (2019). The TT decomposition has also been used by Boyko et al. (2020) to compress 3D scenes that are represented by volumetric distance functions. We review neural-field-based methods separately in the next paragraph. Neural Fields Neural fields as implicit scene representations for geometry and radiance have recently attracted intense research activity, especially in the context of 3D. The application of neural fields to image compression is studied by Strümpler et al. (2021), who employ meta-learned representations that increase efficiency in training. The usual volumetric type of representation is replaced by a surface-based one by Zhang et al. (2021a), who learned bidirectional reflectance

