INFERRING FLUID DYNAMICS VIA INVERSE RENDERING

Abstract

Humans have a strong intuitive understanding of physical processes such as fluid falling by just a glimpse of such a scene picture, i.e., quickly derived from our immersive visual experiences in memory. This work achieves such a photo-to-fluiddynamics reconstruction functionality learned from unannotated videos, without any supervision of ground-truth fluid dynamics. In a nutshell, a differentiable Euler simulator modeled with a ConvNet-based pressure projection solver, is integrated with a volumetric renderer, supporting end-to-end/coherent differentiable dynamic simulation and rendering. By endowing each sampled point with a fluid volume value, we derive a NeRF-like differentiable renderer dedicated from fluid data; and thanks to this volume-augmented representation, fluid dynamics could be inversely inferred from the error signal between the rendered result and groundtruth video frame (i.e., inverse rendering). Experiments on our generated Fluid Fall datasets and DPI Dam Break dataset are conducted to demonstrate both effectiveness and generalization ability of our method.

1. INTRODUCTION

Simulating and rendering complex physics (e.g., fluid dynamics, cloth dynamics, hair dynamics) are major topics in computer graphics, with prevailing applications in games, movies and animation industry. To pursue better adaptability, generalization ability and efficiency, many works (Wiewel et al., 2019; Kim et al., 2019; Li et al., 2019; Ummenhofer et al., 2019; Sanchez-Gonzalez et al., 2020; Pfaff et al., 2021; Wandel et al., 2022; de Avila Belbute-Peres et al., 2018; Hu et al., 2019; Loper & Black, 2014; Liu et al., 2019; Mildenhall et al., 2020; Liu et al., 2020) pay attention to either learning-based (differentiable) simulation or rendering recently, especially to learnable simulation. However, most of these works need large amounts of ground-truth data simulated using traditional physical engines to train a robust and general learning-based simulator. Meanwhile, most existing works rely on the traditional pipeline; namely, it requires stand-alone modules including physical dynamics simulation, 3D modeling/meshization and rasterization/raytracing based rendering, thus the complication makes it tedious and difficult for live streaming deployment. Few works (Wu et al., 2017a; Guan et al., 2022; Li et al., 2021b) attempt to associate simulation and rendering into an integrated module. VDA (Wu et al., 2017a) proposes a combined framework, where a physics engine and a non-differentiable graphic rendering engine are involved to understand physical scenes without human annotation. However, only rigid body dynamics are supported. NeuroFluid (Guan et al., 2022) is a concurrent work that proposes a fully-differentiable two-stage network and grounds Lagrangian fluid flows using image supervision. However, image sequences provide supervision that only reveals the changes in fluid geometry and cannot provide particle correspondence between consecutive frames (i.e., temporal/motion ambiguities) to support Lagrangian simulation. Moreover, lots of spatial ambiguities between particle positions and spatial distribution of fluid (e.g., different particles positions may result in the same fluid distribution used for rendering) are introduced by the proposed Particle-Driven NeRF (Guan et al., 2022) . In reverse rendering, ambiguity is a terrible problem that makes the optimization process prohibitively ill-posed (Zhao et al., 2021) . Thereby, NeuroFluid can only overfit one sequence every time and cannot learn fluid dynamics. (Li et al., 2021b ) develops a similar framework, while it learns dynamics implicitly on latent space and represents the whole 3D environment using only a single vector. Such a rough and implicit representation limits its ability for generalization. We make detailed discussions and comparisons with (Guan et al., 2022; Li et al., 2021b) in Section 4.4.

