DIFFERENTIABLE RENDERING WITH REPARAMETERIZED VOLUME SAMPLING

Abstract

We propose an alternative rendering algorithm for neural radiance fields based on importance sampling. In view synthesis, a neural radiance field approximates underlying density and radiance fields based on a sparse set of scene views. To generate a pixel of a novel view, it marches a ray through the pixel and computes a weighted sum of radiance emitted from a dense set of ray points. This rendering algorithm is fully differentiable and facilitates gradient-based optimization of the fields. However, in practice, only a tiny opaque portion of the ray contributes most of the radiance to the sum. Therefore, we can avoid computing radiance in the rest part. In this work, we use importance sampling to pick non-transparent points on the ray. Specifically, we generate samples according to the probability distribution induced by the density field. Our main contribution is the reparameterization of the sampling algorithm. It allows end-to-end learning with gradient descent as in the original rendering algorithm. With our approach, we can optimize a neural radiance field with just a few radiance field evaluations per ray. As a result, we alleviate the costs associated with the color component of the neural radiance field at the additional cost of the density sampling algorithm.

1. INTRODUCTION

We propose a volume rendering algorithm for learning 3D scenes and generating novel views. Recently, learning-based approaches led to significant progress in this area. As an early instance, (20) represent a scene via a density field and a radiance (color) field parameterized with an MLP. Using a differentiable volume rendering algorithm (18) with the MLP-based fields to produce images, they minimize the discrepancy between the output images and a set of reference images to learn a scene representation. The algorithm we propose is a drop-in replacement for the volume rendering algorithm used in NeRF (20) and follow-ups. The underlying model in NeRF generates an image point in the following way. It casts a ray from a camera through the point and defines the point color as a weighted sum along the ray. The sum aggregates the radiance of each ray point with weights induced by the density field. Each term involves a costly neural network query, and model has a trade-off between rendering quality and computational load. NeRF obtained a better trade off with a two-stage sampling algorithm obtaining ray points with higher weights. The algorithm is reminiscent of importance sampling, yet it requires training an auxiliary model. In this work we propose a rendering algorithm based on importance sampling. Our algorithm also acts in two stages. In the first stage, it marches through the ray to estimate density. In the second stage, it constructs a Monte-Carlo color approximation using the density to pick points along the ray. Figure 1 illustrates the estimates for a varying number of samples. The resulting estimate is fully-differentiable and does not require any auxiliary models. Besides, we only need a few samples to construct a precise color approximation. Intuitively, we only need to compute the radiance of the point where a ray hits a solid surface. As a result, our algorithm is especially suitable for recent architectures (23; 36; 32) that use distinct models to parameterize radiance and density. Specifically, the first stage only queries the density field, whereas the second stage only queries the radiance field. Compared to the standard rendering algorithm, the second stage of our algorithm avoids redundant radiance queries and reduces the memory required for rendering at the cost of slight estimate variance. Below, Section 2 give a recap of neural radiance fields. Then we proceed to the main contribution of our work in Section 3, namely the rendering algorithm fueled by a novel sampling procedure. Finally, in our experiments in Section 5 we evaluate the algorithm in terms of rendering quality, speed and memory requirements.

2. NEURAL RADIANCE FIELDS

Neural radiance fields represent 3D scenes with a non-negative scalar density field σ : R 3 → R + and a vector radiance field c : R 3 × R 3 → R 3 . Scalar field σ represents volume density at each spatial location x, and c(x, d) returns the light emitted from spatial location x in direction d represented as a normalized three dimensional vector. For novel view synthesis, NeRF adapts a volume rendering algorithm that computes pixel color C(r) (denoted with a capital letter) as expected radiance for a ray r = o + td passing through a pixel from origin o ∈ R 3 in direction d ∈ R 3 . To ease the notation, we will denote density and radiance restricted to ray r as σ r (t) := σ(o + td) (1) c r (t) := c(o + td, d). With that in mind, the expected radiance along ray r is given as C(r) = t f tn p r (t)c r (t)dt, where p r (t) := σ r (t) exp - t tn σ r (s)ds . Here, t n and t f are near and far ray boundaries and p r (t) is an unnormalized probability density function of a random variable t on a ray r. Intuitively, t is the location on the ray where a portion of light coming into the point o was emitted. To approximate the nested integrals in Equation 3, Max (18) proposed to replace fields σ r and c r with a piecewise approximation on a grid t n = t 0 < t 1 < • • • < t m = t f and compute the formula 3 analytically for the approximation. In particular, a piecewise constant approximation, which is predominant in NeRF literature, yields formula Ĉ(r) = m i=1 (1 -exp(-σ r (t i )δ i )) exp   - i-1 j=1 σ r (t j )δ j   c(t i ), where δ i := t i+1 -t i . Importantly, Equation 4 is fully differentiable and can be used as a part of gradient-based learning pipeline. Given the ground truth expected color C gt (r) along r, the optimization objective in NeRF L( Ĉ(r), C gt (r)) = Ĉ(r) -C gt (r) 2 2 (5) captures the difference between C gt (r) and estimated color Ĉ(r). To reconstruct a scene NeRF runs a gradient based optimizer to minimize the objective 5 averaged across multiple rays and multiple viewpoints. While the above approximation works in practice, it involves multiple evaluations of c and σ along a dense grid. Besides that, a common situation is when a ray intersects a solid surface at some point t ∈ [t n , t f ]. In this case, probability density p r (t) will concentrate its mass near t and will be close to zero in other parts of the ray. As a result, most of the terms in Equation 4 will make negligible contribution to the sum. In Section 4, we discuss various solutions to picking the grid points that are most likely to contribute to the sum. As an alternative, in the next section we propose to estimate the expected radiance with stochastic estimates that require only few radiance evaluations.

3. STOCHASTIC ESTIMATES FOR THE EXPECTED COLOR

Monte Carlo method gives a natural way to approximate the expected color. For example, given k i.i.d. samples t 1 , . . . , t k ∼ p r (t) and the normalization constant y f := t f tn p r (t)dt, the following sum ĈMC (r) = y f k k i=1 c r (t i ) is an unbiased estimate of the expected radiance in Equation 3. Moreover, samples t 1 , . . . , t k come from high-density regions of p r by design, thus for a degenerate density p r even a few samples would provide an estimate with low variance. Each term in Equation 6 contributes equally to the sum. Importantly, unlike the approximation in Equation 4, the Monte-Carlo estimate depends on scene density σ implicitly through sampling algorithm and requires a custom gradient estimate for the parameters of σ. As an illustration, the full NeRF samples points on a ray from the distribution induced by an auxiliary "coarse" density model. These points are then used as grid knots in approximation 4. However, as their sampling algorithm is non-differentiable and cannot be trained end-to-end, they introduce auxiliary "coarse" radiance field and train "coarse" components separately. Below, we propose propose a principled end-to-end differentiable algorithm to generate samples from p r (t). We then apply the algorithm to estimate radiance as in Equation 6 and optimize the estimates to reconstruct the density and the radiance field of a scene.

3.1. REPARAMETERIZED EXPECTED RADIANCE ESTIMATES

The solution we propose is primarily inspired by the reparameterization trick (12; 31) . We first change the variable in Equation 3. For F r (t) := 1 -exp -t tn σ r (s)ds and y := F r (t) we write C(r) = t f tn c r (t)p r (t)dt = y f yn c r (F -1 r (y))dy. The integral boundaries are y n := F r (t n ) = 0 and y f := F r (t). Function F r (t) acts as the cumulative distribution function of the variable t with a single exception that, in general, y f := F r (t f ) = 1. In volume rendering, F r (t) is called opacity function with y f being equal to overall pixel opaqueness. In the right-hand side of Equation 7, the integral boundaries depend on opacity F r and, as consequence, on ray density σ r . We further simplify the integral by changing the integration boundaries to [0, 1] and substituting y n = 0: y f yn c r (F -1 r (y))dy = 1 0 y f c r (F -1 r (y f u))du. Given the above derivation, we construct the reparameterized Monte Carlo (R/MC) estimate for the right-hand side integral in Equation 8 with k i.i.d. U [0, 1] samples u 1 , . . . , u k : ĈR M C (r) := y f k k i=1 c r (F -1 r (y f u i )). In the above estimate, random samples u 1 , . . . , u k do not depend on volume density σ r or color c r . Essentially, the reparameterized Monte-Carlo estimate generates samples from p r (t) using inverse cumulative distribution function F -1 r (y f u). We further improve the estimate using stratified sampling. We replace uniform samples u 1 , . . . , u k with uniform independent samples within regular grid bins v i ∼ U [ i-1 k+1 , i k+1 ], i = 1, . . . , k and derive the reparameterized stratified Monte Carlo (R/SMC) estimate ĈR SM C (r) := y f k k i=1 c r (F -1 r (y f v i )). It is easy to show that both 9 and 10 are unbiased estimates of 3. Additionally, the gradient of estimates 9 and 10 is an unbiased estimate of the gradient of the expected color C(r). However, in practice we can only query σ r at certain ray points and cannot compute F r analytically. Thus, in the following section, we introduce approximations of F r and its inverse.

3.2. OPACITY APPROXIMATIONS

Expected radiance estimate 9 relies on opacity F r (t) = 1 -exp -t tn σ r (s)ds and its inverse F -1 r (y). We propose to approximate the opacity using a piecewise density field approximation. Figure 2 illustrates the approximations and ray samples obtained through opacity inversion. t n t f t y Density Field Ground truth (t) PWC 0 (t) PWL 1 (t) t n t f t 0.00 0.25 0.50 0.75 Opacity Opacity F(t) Approximation F 0 (t) Approximation F 1 (t) F 1 1 (y) Figure 2 : Illustration of opacity inversion. On the left, we approximate density field σ r with a piecewise constant (PWC) and a piecewise linear (PWL) approximation. On the right, we approximate opacity F r (t) and compute F -1 r (y f u) for u ∼ U [0, 1]. To construct the approximation, we take a grid t n = t 0 < t 1 < • • • < t m = t f and construct piecewise constant and piecewise linear approximations. In the piecewise linear case, we compute σ r in the grid points and interpolate the values between the grid point. In the piecewise constant case, we pick a random point within each bin t i ≤ ti ≤ t i + 1 and approximate density with σ r ( ti ) inside the corresponding bin. Importantly, for a non-negative field these approximations are also non-negative. Then we compute the integral t tn σ r (s)ds used in F r (t) for t ∈ [t i , t i+1 ) analytically as a sum of rectangular areas I 0 (t) = i j=1 σ r ( tj )(t j -t j-1 ) + σ r ( ti )(t -t i ) for the the piecewise constant approximation and as a sum of trapezoidal areas for the piecewise linear approximation I 1 (t) = i j=1 σ r (t j ) + σ r (t j-1 ) 2 (t j -t j-1 ) + (σ r (t i ) + σr (t)) 2 (t -t i ), where σr (t) = σ r (t i ) ti+1-t ti+1-ti + σ r (t i+1 ) t-ti ti+1-ti is the interpolated density at t. Given these approximations, we are now able to approximate F r and y f in Equation 9. We generate samples on a ray based on inverse opacity F -1 r (y) by solving the equation y f u = F r (t) = 1 -exp - t tn σ r (s)ds for t, where u ∈ [0, 1] is a random sample. We rewrite the equation as -log(1 -y f u) = t tn σ r (s)ds. and note that integral approximations I 0 (t) and I 1 (t) are monotonic piecewise linear and piecewise quadratic functions. We obtain the solution of Equation 14 by first finding a bin that contains a solution and then solving a linear or a quadratic equation. Crucially, solution t can be seen as a differentiable function of the density field σ r and we can back-propagate the gradients w.r.t. σ r through t. In the supplementary materials, we provide explicit formulae for t for both approximations and discuss the solutions crucial for the numerical stability. Additionally, we provide an alternative inversion algorithm in the case when t tn σ r (s)ds can be computed without approximations. In our experiments we report the results only for piecewise linear approximation. In our prelimenary experiments, the piecewise constant approximation was faster but delivered worse rendering quality.

4. RELATED WORK

There are multiple ways to represent the shape for a scene for novel view synthesis. Earlier learningbased approaches rely on such implicit representations as signed distance fields (27; 34; 35) and occupancy fields (19; 25) to represent non-transparent objects. We concentrate on implicit representations based on density fields pioneered in NeRF (20) . Each representation relies on a designated rendering algorithm. In particular, NeRF relies on an emission-absorption optical model developed in (11) with a numerical scheme specified in (18) . Monte-Carlo estimates for integral approximations. In this work, we revisit the algorithm introduced to approximate the expected color in (18) . Currently, the algorithm is a default solution in multiple of works on neural radiance fields. The authors of ( 18) approximate density and radiance fields with a piecewise constant functions along a ray and compute 3 as an approximation. Instead, we reparameterize Equation 3 and construct Monte-Carlo estimates for the integral. To compute the estimates in practice we use piecewise approximations only for the density field. The cumulative density function (CDF) used in our estimates involves integrating density field along a ray. In (15) , the authors construct field anti-derivatives to accelerate inference. While they use the anti-derivatives to compute 3 on a grid with fewer knots, the anti-derivatives can apply in our framework to construct Monte-Carlo approximations based on the inverse CDF without resorting to piecewise approximations. In the past decade, integral reparameterizations became a common practice in generative modeling (13; 31) and approximate Bayesian inference (3; 7; 22) . Similar to Equation 3, objectives in these areas require optimizing expected values with respect to distribution parameters. We refer readers to (21) for a systematic overview. Notably, in computer graphics, (17) apply reparameterization to estimate gradients of path traced images with respect to scene parameters. Algorithms for picking ray points. Opposed to numerical scheme in Equation 4, our algorithm only requires to evaluate radiance at a sparse set of points sampled from the density field. In (20) , the authors use a similar hierarchical scheme to generate ray points using an auxiliary coarse density field. Crucially, unlike our reparameterized importance sampling, the importance sampling algorithm in their work does not allow differentiating with respect to the coarse model parameters. The ad-hoc solution introduced in (20) is to train the coarse model separately using the same rendering objective 5. Subsequent works propose variations of the scheme: Mip-NeRF (2) merges coarse and fine models using a scale-aware neural field, and Mip-NeRF 360 (2) distills the coarse density field from a fine field instead of training an auxiliary coarse radiance field. For non-transparent scenes Unisurf (26) treats the density field as an occupancy field and gradually incorporates root-finding algorithms into volume sampling. Simultaneously, a number of works propose training an auxiliary model to return coarse samples for a given ray. For instance, DoNeRF (24) uses a designated depth oracle network supervised with ground truth depth maps, TermiNeRF (28) foregoes the depth supervision by distilling the sampling network from a pre-trained NeRF model. Finally, the authors of (1) train a proposal network to generate points on a ray end-to-end starting with a pre-trained NeRF. The aforementioned works speed up rendering, but the reliance on auxiliary networks hinders using faster grid-based architectures and makes the overall scene representation less interpretable. In contrast to the above works, our algorithm learns sampling points on a ray from scratch in an end-to-end fashion, works with an arbitrary density field, and does not requires any auxiliary models. NeRF acceleration through architecture and sparsity. The above algorithms for picking points on a ray generally aim to reduce the number of field evaluations during rendering. An alternative optimization approach is to reduce the time required to evaluate the field. In the past few years, a variety of architectures combining Fourier features (33) and grid-based features was proposed (8; 32; 36; 30) . Besides grids, some works exploit space partitions based on Voronoi diagrams (29), trees (10; 37) and even hash tables (23) . These architectures generally trade-off inference speed for parameter count. TensorRF (4) stores the grid tensors in a compressed format to achieve both high compression and fast performance. On top of that, skipping the density queries for the empty parts of a scene additionally improves rendering time (14) . For the novel view synthesis, the idea allows to speed up rendering during training and inference (9; 6; 16) . Notably, our rendering algorithm works with arbitrary density fields and, as a result, is compatible with the improved field architectures and sparse fields.

5.1. IMPORTANCE SAMPLING FOR A SINGLE RAY

We begin with comparison of importance sampling color estimates in a one-dimensional setting. In this experiment, we assume that we know density in advance and show how the estimate variance depends on number of radiance calls. Compared to importance sampling, the standard approximation from Equation 4has zero variance but does not allow controlling number of radiance calls. Our experiment models light propagation on a single ray in three typical situations. The upper row of Figure 3 defines a scalar radiance field (orange) c r (t) and opacity functions (blue) F r (t) for • "Foggy" density field. It models a semi-transparent volume. Similar fields occur after model initialization during density field training; • "Glass and wall" density field. Models light passing through nearly transparent volumes such as glass. The light is emitted at three points: the inner and outer surface of the transparent volume and an opaque volume near the end of the ray; • "Wall" density field. Light is emitted from a single point on a ray. Such fields are most common in applications. Compared to a naive importance sampling estimate (dashed red), reparameterized sampling exhibits lower variance (dashed green). Stratified sampling improves variance in both setups (solid lines). For the three fields we estimated the expected radiance C(r) = t f tn c r (t)dF r (t). We considered two baseline methods (both in red in Figure 3 ): the first was an importance sampling estimate of C obtained with uniform distribution on a ray U [t n , t f ], and its stratified modification with a uniform grid t n = t 0 < • • • < t k = t f (note that here we use k to denote the number of samples, not the number of grid points m in piecewise density approximation): ĈIW (r) = k i=1 (t i -t i-1 )c r (τ i ) dF r dt t=τi , with independent τ i ∼ U [t i-1 , t i ]. We compared the baseline against estimate from Equation 9 and its stratified counterpart from Equation 10. All estimates are unbiased. Therefore, we only compared the estimates variances for a varying number of samples m. In all setups, our stratified estimate uniformly outperformed the baselines. For the most challenging "foggy" field, approximately k = 32 samples we required to match the baseline performance for k = 128. We matched the baseline with only a k = 4 samples for other fields. Importance sampling requires only a few points for degenerate distributions. In further experiments, we take k = 32 to obtain a precise color estimate even when a model did not converge to a degenerate distribution.

5.2. SCENE RECONSTRUCTION WITH REPARAMETERIZED VOLUME SAMPLING

Next, we apply our algorithm to 3D scene reconstruction based on a set of image projections. As a benchmark, we use the synthetic data from NeRF (20) . The primary goal of the experiment is to demonstrate computational advantages of our algorithm compared to the basic volume rendering algorithm. (32) . Metrics are calculated over test views for synthetic scenes (20) with k = 32 points in color estimates and m = 256 knots along each ray in our NeRF modificatio, for details please see Section 3.2. Our method is slightly worse than NeRF without hierarchical sampling (coarse model) in terms of average PSNR and SSIM, although it is slightly better in terms of average LPIPS. As we have only modified the underlying integration scheme, we expected the model performance to match the non-hierarchical NeRF. For LPIPS calculation we used official implementation (38) and VGG features. Similarly, our modification of DVGO is slightly worse than the original DVGO.

5.2.1. NEURAL RADIANCE FIELDS

As our first model, we took the original NeRF's architecture (20) and hyperparameters without any modifications, except for the output activation of field σ. In particular, we used Softplus activation with β = 10 instead of ReLU to avoid zero gradients. To form a training batch during training we took a random subset of 64 training images and sampled 64 rays per images. To construct a piecewise linear density approximation, we slightly perturbed a uniform ray grid with 256 knots. For the proposed importance-sampling reparameterization we calculated Equation 10 with k = 32 samples to estimate color. In the Table 1 below, we report the obtained results. For reference, we also provide the NeRF metrics with and without hierarchical sampling. With NeRF architecture we expected our algorithm to be comparable to NeRF without hierarchical sampling: these two models use grids of the same density and do not rely on hierarchical sampling. However, despite our expectations, the quantitative results of the baseline were slightly better. The only difference between NeRF without hierarchical sampling and our model is the underlying expected color approximation. We speculate that the variance of our stochastic estimate prevents model from finding fine optimum. For reference we also provide the result for the full NeRF model, however the model is not directly comparable to ours. Even though the full NeRF model also samples points along a ray in a two-stage manner, it re-weights the output points using a second "fine" network, whereas samples in our model are weighted with uniform weights (see Equation 10). (20) . Speed represents the average rendering time of a single 800 × 800 frame on NVIDIA v100 GPU. We measured speed and memory usage in pytorch3d's re-implementation of NeRF as our implementation is also written in pytorch. Our algorithm slightly improves rendering time and memory footprint.

Model

Then, we evaluated the proposed method with a varying number of samples at the inference stage (k in Equation 10). We took the Lego scene model from the previous experiment and varied the number of points in our reparametrized color estimation. The quantitative results of this experiments can be found in Table 2 and Figure 4 contain qualitative results. From the rendering quality viewpoint, the three metrics gradually increased with the number of samples and saturated and approximately 16 points. Our algorithm produced sensible renders even for k = 1, however noise artifacts only disappeared for k = 8.  The expression above is a combination of a logarithm and exponent. We rewrite it to replace with more reliable logsumexp operator:  In practice, for opaque rays t f tn σ r (s)ds ≈ 0 implementation of logsumexp becomes computationally unstable. In this case, we replace y with u as they are almost identical.

A.3 IMPLICIT INVERSE OPACITY GRADIENTS

To compute the estimates in Equation 9, we need to compute the inverse opacity F -1 r (y) along with its gradient. In the main paper, we invert opacity explicitly with a differentiable algorithm. Alternatively, we could invert F r (t) = 1 -exp -t tn σ r (s)ds with binary search. Opacity F r (t) is a monotonic function and for y ∈ (y n , y f ) = (F r (t n ), F r (t f )) the inverse lies in (t n , t f ). To compute F -1 r (y), we start with boundaries t l = t n and t r = t f and gradually decrease the gap between the boundaries based on the comparison of F r ( t l +tr 2 ) with y. Importantly, such procedure is easy to parallelize across multiple inputs and multiple rays. However, we cannot back-propagate through the binary search iterations and need a workaround to compute the gradient ∂t ∂θ of t(θ) = F -1 r (y, θ). To do this, we follow (5)  We solve Equation 29for ∂t ∂θ and substitute the partial derivatives using Equations 30 and 31 to obtain the final expression for the gradient  In our implementation, we use automatic differentiation to compute ∂y/∂θ and ∂ ∂θ t tn σ(s)ds to combine the results as in Equation 32.



On the coarse stage, we observed at least 20%-30% improvement in training time. The improvement can be attributed to fewer radiance samples. Using the auxiliary mask for empty space regions, DVGO significantly reduces the number of radiance calls on the second stage. As a result, our rendering algorithm improved training time only when the number of samples is lower than the average number of radiance calls in DVGO(13.4 in this case). At the same time, as the PSNR column indicates, the rendering quality deteriorated with fewer radiance samples. Notably, in our experiments, the model trained with 64 samples achieved PSNR 34.33 even with 16 during evaluation stage. We concluded that lower PSNRs are caused by the estimate variance during training.6 CONCLUSIONIn this work, we proposed an alternative rendering algorithm for novel view synthesis models based on radiance fields. The core of our contribution is end-to-end differentiable ray point sampling algorithm. For two pre-existing architectures, we show that the algorithm can achieve competitive rendering quality while reducing training and rendering time, and required GPU memory. Besides that, we believe that such an algorithm opens up new possibilities in efficient rendering and architecture design that are yet to be explored.



Figure 1: Novel views of a ship generated with the proposed radiance estimates. For each ray we estimate density and then compute radiance at a few ray points generated using the ray density. As the above images indicate, render quality gradually improves with the number of ray points and saturates at approximately 16 ray points.

Figure 3: Color estimate variance compared for a varying number of samples. The upper plot illustrates underlying opacity function on a ray; the lower graph depicts variance in logarithmic scale.Compared to a naive importance sampling estimate (dashed red), reparameterized sampling exhibits lower variance (dashed green). Stratified sampling improves variance in both setups (solid lines).

Figure 4: NeRF rendering results with a different number of samples in the proposed stratified estimate with re-sampling. From left to right and from top to down: 1, 2, 4, 8, 32 points estimates and ground truth for reference.

= -log exp(log(1 -u)) + exp(log u -t f tn σ r (s)ds) .

and compute differentials of the right and the left hand side of equation y(θ) = F r (t, of F r (t, θ) we have∂Fr ∂t = (1 -F r (t, θ))σ r (t, θ),(30)∂Fr ∂θ = (1 -F r (t, θ)) ∂ ∂θ t tn σ r (s, θ)ds .

F r (t, θ)) ∂ ∂θ t tn σ r (s, θ)ds (1 -F r (t, θ))σ r (t, θ).

Rendering quality comparison with NeRF and DVGO

Ablation study and comparison in terms of speed and quality with different number of points in improtance-weighted color estimate. We compare inference on views of Lego scene

5.2.2. DIRECT VOXEL GRID OPTIMIZATION

We also tested our rendering algorithm on a recent voxel-based radiance field model DVGO (32) . The model takes only a few minutes to train thanks to lightning fast architecture, progressive scaling, and custom CUDA kernels. We took the official implementation and only replaced the rendering algorithm based on Equation 4. To achieve the rendering performance competitive with their CUDA kernels, we optimized the performance using just-in-time compilation module in Pytorch.We evaluated the two training stages of DVGO with a varying number of radiance samples. On the first "coarse" stage, the model fits a low-resolution 3D density grid and a view-independent 3D radiance grid. On the second "fine" stage, the model fits a density grid with gradually improving resolution and view-dependent radiance field combining a grid and an MLP. Crucially, the second stage relies on a coarse grid from the first stage to skip empty regions in space and optimize the performance. Table 3 A APPENDIX Below we discuss caveats and implementation details of our sampling algorithm.

A.1 INVERSE FUNCTIONS FOR DENSITY INTEGRALS

In this section, we derive explicit formulae for the density integral inverse used in inverse opacity.

A.1.1 PIECEWISE CONSTANT APPROXIMATION INVERSE

We start with a formula for the integraland solve for t equation y = I 0 (t).(17) The equation above is a linear equation with solutionIn our implementation we add small to the denominator to improve stability when σ r ( ti ) ≈ 0.

A.1.2 PIECEWISE LINEAR APPROXIMATION INVERSE

The piecewise linear density approximation yield a piecewise quadratic functionAgain, we solve y = I 1 (t) (20) for t. We change the variable to ∆t := t -t i and note that terms a and c in quadratic equation 0 = a∆t 2 + b∆t + c (21) will beand with a few algebraic manipulations we find the linear term b = σ r (t i ) × (t i+1 -t i ).(24) Since our integral monotonically increases, we can deduce that the root ∆t must be 

