DIFFERENTIABLE RENDERING WITH REPARAMETERIZED VOLUME SAMPLING

Abstract

We propose an alternative rendering algorithm for neural radiance fields based on importance sampling. In view synthesis, a neural radiance field approximates underlying density and radiance fields based on a sparse set of scene views. To generate a pixel of a novel view, it marches a ray through the pixel and computes a weighted sum of radiance emitted from a dense set of ray points. This rendering algorithm is fully differentiable and facilitates gradient-based optimization of the fields. However, in practice, only a tiny opaque portion of the ray contributes most of the radiance to the sum. Therefore, we can avoid computing radiance in the rest part. In this work, we use importance sampling to pick non-transparent points on the ray. Specifically, we generate samples according to the probability distribution induced by the density field. Our main contribution is the reparameterization of the sampling algorithm. It allows end-to-end learning with gradient descent as in the original rendering algorithm. With our approach, we can optimize a neural radiance field with just a few radiance field evaluations per ray. As a result, we alleviate the costs associated with the color component of the neural radiance field at the additional cost of the density sampling algorithm.

1. INTRODUCTION

We propose a volume rendering algorithm for learning 3D scenes and generating novel views. Recently, learning-based approaches led to significant progress in this area. As an early instance, (20) represent a scene via a density field and a radiance (color) field parameterized with an MLP. Using a differentiable volume rendering algorithm (18) with the MLP-based fields to produce images, they minimize the discrepancy between the output images and a set of reference images to learn a scene representation. The algorithm we propose is a drop-in replacement for the volume rendering algorithm used in NeRF (20) and follow-ups. The underlying model in NeRF generates an image point in the following way. It casts a ray from a camera through the point and defines the point color as a weighted sum along the ray. The sum aggregates the radiance of each ray point with weights induced by the density field. Each term involves a costly neural network query, and model has a trade-off between rendering quality and computational load. NeRF obtained a better trade off with a two-stage sampling algorithm obtaining ray points with higher weights. The algorithm is reminiscent of importance sampling, yet it requires training an auxiliary model. In this work we propose a rendering algorithm based on importance sampling. Our algorithm also acts in two stages. In the first stage, it marches through the ray to estimate density. In the second stage, it constructs a Monte-Carlo color approximation using the density to pick points along the ray. Figure 1 illustrates the estimates for a varying number of samples. The resulting estimate is fully-differentiable and does not require any auxiliary models. Besides, we only need a few samples to construct a precise color approximation. Intuitively, we only need to compute the radiance of the point where a ray hits a solid surface. As a result, our algorithm is especially suitable for recent architectures (23; 36; 32) that use distinct models to parameterize radiance and density. Specifically, the first stage only queries the density field, whereas the second stage only queries the radiance field. Compared to the standard rendering algorithm, the second stage of our algorithm avoids redundant radiance queries and reduces the memory required for rendering at the cost of slight estimate variance.

