PAC-NERF: PHYSICS AUGMENTED CONTINUUM NEURAL RADIANCE FIELDS FOR GEOMETRY-AGNOSTIC SYSTEM IDENTIFICATION

Abstract

Existing approaches to system identification (estimating the physical parameters of an object) from videos assume known object geometries. This precludes their applicability in a vast majority of scenes where object geometries are complex or unknown. In this work, we aim to identify parameters characterizing a physical system from a set of multi-view videos without any assumption on object geometry or topology. To this end, we propose "Physics Augmented Continuum Neural Radiance Fields" (PAC-NeRF), to estimate both the unknown geometry and physical parameters of highly dynamic objects from multi-view videos. We design PAC-NeRF to only ever produce physically plausible states by enforcing the neural radiance field to follow the conservation laws of continuum mechanics. For this, we design a hybrid Eulerian-Lagrangian representation of the neural radiance field, i.e., we use the Eulerian grid representation for NeRF density and color fields, while advecting the neural radiance fields via Lagrangian particles. This hybrid Eulerian-Lagrangian representation seamlessly blends efficient neural rendering with the material point method (MPM) for robust differentiable physics simulation. We validate the effectiveness of our proposed framework on geometry and physical parameter estimation over a vast range of materials, including elastic bodies, plasticine, sand, Newtonian and non-Newtonian fluids, and demonstrate significant performance gain on most tasks 1 .

1. INTRODUCTION

Inferring the geometric and physical properties of an object directly from visual observations is a long-standing challenge in computer vision and artificial intelligence. Current machine vision systems are unable to disentangle the geometric structure of the scene, the dynamics of moving objects, and the mechanisms underlying the imaging process -an innate cognitive process in human perception. For example, by merely watching someone kneading and rolling dough, we are able to disentangle the dough from background clutter, form a predictive model of its dynamics, and estimate physical properties, such as its consistency to be able to replicate the recipe. There exists a large body of work on inferring the geometric (extrinsic) structure of the world from multiple images (e.g., structure-from-motion (Hartley & Zisserman, 2003) ). This has been bolstered by recent approaches leveraging differentiable rendering pipelines (Tewari et al., 2022) and neural scene representations (Xie et al., 2022) , unlocking a new level of performance and visual realism. On the other hand, approaches to extract the physical (intrinsic) properties (e.g., mass, friction, viscosity) from images are yet nascent (Jatavallabhula et al., 2020; Ma et al., 2021; Jaques et al., 2022; 2020 ) -all assume full knowledge of the geometric structure of the scene, thereby limiting their applicability. The key question we ask in this work is "can we recover both the geometric structure and the physical properties of a wide range of objects from multi-view video sequences"? This dispenses with all of the assumptions made by state-of-the-art approaches to video-based system identification (known geometries in Ma et al. ( 2021) and additionally rendering configurations in Jatavallabhula et al. ( 2020)). Additionally, the best performing approaches to recover geometries (but not physical properties) of dynamic objects in videos include variants of neural radiance fields (NeRF) (Mildenhall et al., 2020 ), such as (Pumarola et al., 2021; Tretschk et al., 2021; Park et al., 2021) . However, all such neural representations of dynamic scenes need to learn object dynamics from scratch, requiring a significant amount of data to do so, while also being uninterpretable. We instead employ a differentiable physics simulator as a more prescriptive, data-efficient, and generalizable dynamics model; enabling parameter estimation solely from videos. Our approach-Physics Augmented Continuum Neural Radiance Fields (PAC-NeRF)-is a novel system identification technique that assumes nothing about the geometric structure of a system. PAC-NeRF is extremely general -operating on deformable solids, granular media, plastics, and Newtonian/non-Newtonian fluids. PAC-NeRF brings together the best of both worlds; differentiable physics and neural radiance fields for dynamic scenes. By augmenting a NeRF with a differentiable continuum dynamics model, we obtain a unified model that estimates object geometries and their physical properties in a single framework. Specifically, a PAC-NeRF F is a NeRF, comprising a volume density field and a color field, coupled with a velocity field v that admits the continuum conservation law: DF Dt = 0 (Spencer, 2004). In conjunction with a hybrid Eulerian-Lagrangian formulation, this allows us to advect geometry and appearance attributes to all frames in a video sequence, enabling the specification of a reconstruction error in the image space. This error term is minimized by gradient-based optimization, leveraging the differentiability of the entire computation graph, and enables system identification over a wide range of physical systems, where neither the geometry nor the rendering configurations are known. Our hybrid representation considerably speeds up the original MLP-based NeRF by efficient voxel discretization Sun et al. (2022) , and also conveniently handles collisions in continuum simulations, following the MPM pipeline Jiang et al. (2015) . The joint differentiable rendering-simulation pipeline with a unified Eulerian-Lagrangian conversion is highly optimized for high-performance computing on GPU. In summary, we make the following contributions. • We propose PAC-NeRF -a dynamic neural radiance field that satisfies the continuum conservation law (Section 3.1). • We introduce a hybrid Eulerian-Lagrangian representation, seamlessly blending the Eulerian nature of NeRF with MPM's Lagrangian particle dynamics. (Section 3.3). • Our framework estimates both the geometric structure and physical parameters of a wide variety of complex systems, including elastic materials, plasticine, sand, and Newtonian/non-Newtonian fluids, outperforming state-of-the-art approaches by up to two orders of magnitude. (Section 5). 



Neural radiance fields (NeRF), introduced in Mildenhall et al.(2020), are a widely adopted technique to encode scene geometry in a compact neural network; enabling photo-realistic rendering and depth estimation from novel views. A comprehensive survey of neural fields is available in Xie et al. (2022). In this work, we adopt the voxel representation proposed by Sun et al. (2022) as these do not require positional information and naturally fit the Eulerian stage of the Material Point Method (MPM) used in our physics prior. For perception of dynamic scenes, Li et al. (2021) introduce forward and backward motion fields to enforce consistency in the representation space of neighboring frames. D-NeRF (Pumarola et al., 2021) introduces a canonical frame with a unique neural field for densities and colors, and a timedependent backward deformation map to query the canonical frame. This representation has since been adopted in Tretschk et al. (2021) and Park et al. (2021). Chu et al. (2022) targets smoke scenes and advects the density field by the velocity field of smoke. This method does not deal with boundary conditions, so it cannot model solids and contact. Guan et al. (2022) present a combination of NeRF with intuitive fluid dynamics leveraging neural simulators; whereas we provided a principled, and interpretible simulation-and-rendering framework.

funding

* This work was done during an internship at the MIT-IBM Watson AI Lab 1 Demos are available on the project webpage: https://sites.google.com/view/PAC-NeRF 

