VOGE:A DIFFERENTIABLE VOLUME RENDERER USING GAUSSIAN ELLIPSOIDS FOR ANALYSIS-BY-SYNTHESIS

Abstract

The Gaussian reconstruction kernels have been proposed by Westover (1990) and studied by the computer graphics community back in the 90s, which gives an alternative representation of object 3D geometry from meshes and point clouds. On the other hand, current state-of-the-art (SoTA) differentiable renderers, Liu et al. ( 2019), use rasterization to collect triangles or points on each image pixel and blend them based on the viewing distance. In this paper, we propose VoGE, which utilizes the volumetric Gaussian reconstruction kernels as geometric primitives. The VoGE rendering pipeline uses ray tracing to capture the nearest primitives and blends them as mixtures based on their volume density distributions along the rays. To efficiently render via VoGE, we propose an approximate closeform solution for the volume density aggregation and a coarse-to-fine rendering strategy. Finally, we provide a CUDA implementation of VoGE, which enables real-time level rendering with a competitive rendering speed in comparison to PyTorch3D. Quantitative and qualitative experiment results show VoGE outperforms SoTA counterparts when applied to various vision tasks, e.g., object pose estimation, shape/texture fitting, and occlusion reasoning.

1. INTRODUCTIONS

Recently, the integration of deep learning and computer graphics has achieved significant advances in lots of computer vision tasks, e.g., pose estimation Wang et al. (2020a ), 3D reconstruction Zhang et al. (2021) , and texture estimation Bhattad et al. (2021) . Although the rendering quality of has significant improved over decades of development of computer graphics, the differentiability of the rendering process still remains to be explored and improved. Specifically, differentiable renderers compute the gradients w.r.t. the image formation process, and hence enable to broadcast cues from 2D images towards the parameters of computer graphics models, such as the camera parameters, and object geometries and textures. Such an ability is also essential when combining graphics models with deep neural networks. In this work, we focus on developing a differentiable renderer using explicit object representations, i.e.Gaussian reconstruction kernels, which can be either used separately for image generation or for serving as 3D aware neural network layers. The traditional rendering process typically involves a naive rasterization Kato et al. (2018) , which projects geometric primitives onto the image plane and only captures the nearest primitive for each pixel. However, this process eliminates the cues from the occluded primitives and blocks gradients toward them. Also the rasterization process introduces a limitation for differentiable rendering, that rasterization assumes primitives do not overlap with each other and are ordered front to back along the viewing direction Zwicker et al. (2001) . Such assumption raise a paradox that during gradient based optimization, the primitives are necessary to overlap with each other when they change the order along viewing direction. Liu et al. (2019) provide a naive solution that tracks a set of nearest primitives for each image pixel, and blending them based on the viewing distance. However, such

availability

https://github.com/Angtian/VoGE.

