DIFFERENTIABLE RENDERING WITH REPARAMETERIZED VOLUME SAMPLING

Abstract

We propose an alternative rendering algorithm for neural radiance fields based on importance sampling. In view synthesis, a neural radiance field approximates underlying density and radiance fields based on a sparse set of scene views. To generate a pixel of a novel view, it marches a ray through the pixel and computes a weighted sum of radiance emitted from a dense set of ray points. This rendering algorithm is fully differentiable and facilitates gradient-based optimization of the fields. However, in practice, only a tiny opaque portion of the ray contributes most of the radiance to the sum. Therefore, we can avoid computing radiance in the rest part. In this work, we use importance sampling to pick non-transparent points on the ray. Specifically, we generate samples according to the probability distribution induced by the density field. Our main contribution is the reparameterization of the sampling algorithm. It allows end-to-end learning with gradient descent as in the original rendering algorithm. With our approach, we can optimize a neural radiance field with just a few radiance field evaluations per ray. As a result, we alleviate the costs associated with the color component of the neural radiance field at the additional cost of the density sampling algorithm.

1. INTRODUCTION

We propose a volume rendering algorithm for learning 3D scenes and generating novel views. Recently, learning-based approaches led to significant progress in this area. As an early instance, (20) represent a scene via a density field and a radiance (color) field parameterized with an MLP. Using a differentiable volume rendering algorithm (18) with the MLP-based fields to produce images, they minimize the discrepancy between the output images and a set of reference images to learn a scene representation. The algorithm we propose is a drop-in replacement for the volume rendering algorithm used in NeRF (20) and follow-ups. The underlying model in NeRF generates an image point in the following way. It casts a ray from a camera through the point and defines the point color as a weighted sum along the ray. The sum aggregates the radiance of each ray point with weights induced by the density field. Each term involves a costly neural network query, and model has a trade-off between rendering quality and computational load. NeRF obtained a better trade off with a two-stage sampling algorithm obtaining ray points with higher weights. The algorithm is reminiscent of importance sampling, yet it requires training an auxiliary model. In this work we propose a rendering algorithm based on importance sampling. Our algorithm also acts in two stages. In the first stage, it marches through the ray to estimate density. In the second stage, it constructs a Monte-Carlo color approximation using the density to pick points along the ray. Figure 1 illustrates the estimates for a varying number of samples. The resulting estimate is fully-differentiable and does not require any auxiliary models. Besides, we only need a few samples to construct a precise color approximation. Intuitively, we only need to compute the radiance of the point where a ray hits a solid surface. As a result, our algorithm is especially suitable for recent architectures (23; 36; 32) that use distinct models to parameterize radiance and density. Specifically, the first stage only queries the density field, whereas the second stage only queries the radiance field. Compared to the standard rendering algorithm, the second stage of our algorithm avoids redundant radiance queries and reduces the memory required for rendering at the cost of slight estimate variance. Figure 1 : Novel views of a ship generated with the proposed radiance estimates. For each ray we estimate density and then compute radiance at a few ray points generated using the ray density. As the above images indicate, render quality gradually improves with the number of ray points and saturates at approximately 16 ray points. Below, Section 2 give a recap of neural radiance fields. Then we proceed to the main contribution of our work in Section 3, namely the rendering algorithm fueled by a novel sampling procedure. Finally, in our experiments in Section 5 we evaluate the algorithm in terms of rendering quality, speed and memory requirements.

2. NEURAL RADIANCE FIELDS

Neural radiance fields represent 3D scenes with a non-negative scalar density field σ : R 3 → R + and a vector radiance field c : R 3 × R 3 → R 3 . Scalar field σ represents volume density at each spatial location x, and c(x, d) returns the light emitted from spatial location x in direction d represented as a normalized three dimensional vector. For novel view synthesis, NeRF adapts a volume rendering algorithm that computes pixel color C(r) (denoted with a capital letter) as expected radiance for a ray r = o + td passing through a pixel from origin o ∈ R 3 in direction d ∈ R 3 . To ease the notation, we will denote density and radiance restricted to ray r as σ r (t) := σ(o + td) (1) c r (t) := c(o + td, d). (2) With that in mind, the expected radiance along ray r is given as  C(r) = t f tn p r (t)c r (t) Here, t n and t f are near and far ray boundaries and p r (t) is an unnormalized probability density function of a random variable t on a ray r. Intuitively, t is the location on the ray where a portion of light coming into the point o was emitted. To approximate the nested integrals in Equation 3, Max (18) proposed to replace fields σ r and c r with a piecewise approximation on a grid t n = t 0 < t 1 < • • • < t m = t f and compute the formula 3 analytically for the approximation. In particular, a piecewise constant approximation, which is predominant in NeRF literature, yields formula Ĉ(r) = m i=1 (1 -exp(-σ r (t i )δ i )) exp (5)



t i ), where δ i := t i+1 -t i . (4)Importantly, Equation 4 is fully differentiable and can be used as a part of gradient-based learning pipeline.Given the ground truth expected color C gt (r) along r, the optimization objective in NeRF L( Ĉ(r), C gt (r)) = Ĉ(r) -C gt (r) 2 2

dt, where p r (t) := σ r (t) exp -

