INFERRING FLUID DYNAMICS VIA INVERSE RENDERING

Abstract

Humans have a strong intuitive understanding of physical processes such as fluid falling by just a glimpse of such a scene picture, i.e., quickly derived from our immersive visual experiences in memory. This work achieves such a photo-to-fluiddynamics reconstruction functionality learned from unannotated videos, without any supervision of ground-truth fluid dynamics. In a nutshell, a differentiable Euler simulator modeled with a ConvNet-based pressure projection solver, is integrated with a volumetric renderer, supporting end-to-end/coherent differentiable dynamic simulation and rendering. By endowing each sampled point with a fluid volume value, we derive a NeRF-like differentiable renderer dedicated from fluid data; and thanks to this volume-augmented representation, fluid dynamics could be inversely inferred from the error signal between the rendered result and groundtruth video frame (i.e., inverse rendering). Experiments on our generated Fluid Fall datasets and DPI Dam Break dataset are conducted to demonstrate both effectiveness and generalization ability of our method.

1. INTRODUCTION

Simulating and rendering complex physics (e.g., fluid dynamics, cloth dynamics, hair dynamics) are major topics in computer graphics, with prevailing applications in games, movies and animation industry. To pursue better adaptability, generalization ability and efficiency, many works (Wiewel et al., 2019; Kim et al., 2019; Li et al., 2019; Ummenhofer et al., 2019; Sanchez-Gonzalez et al., 2020; Pfaff et al., 2021; Wandel et al., 2022; de Avila Belbute-Peres et al., 2018; Hu et al., 2019; Loper & Black, 2014; Liu et al., 2019; Mildenhall et al., 2020; Liu et al., 2020) pay attention to either learning-based (differentiable) simulation or rendering recently, especially to learnable simulation. However, most of these works need large amounts of ground-truth data simulated using traditional physical engines to train a robust and general learning-based simulator. Meanwhile, most existing works rely on the traditional pipeline; namely, it requires stand-alone modules including physical dynamics simulation, 3D modeling/meshization and rasterization/raytracing based rendering, thus the complication makes it tedious and difficult for live streaming deployment. Few works (Wu et al., 2017a; Guan et al., 2022; Li et al., 2021b) attempt to associate simulation and rendering into an integrated module. VDA (Wu et al., 2017a) proposes a combined framework, where a physics engine and a non-differentiable graphic rendering engine are involved to understand physical scenes without human annotation. However, only rigid body dynamics are supported. NeuroFluid (Guan et al., 2022) is a concurrent work that proposes a fully-differentiable two-stage network and grounds Lagrangian fluid flows using image supervision. However, image sequences provide supervision that only reveals the changes in fluid geometry and cannot provide particle correspondence between consecutive frames (i.e., temporal/motion ambiguities) to support Lagrangian simulation. Moreover, lots of spatial ambiguities between particle positions and spatial distribution of fluid (e.g., different particles positions may result in the same fluid distribution used for rendering) are introduced by the proposed Particle-Driven NeRF (Guan et al., 2022) . In reverse rendering, ambiguity is a terrible problem that makes the optimization process prohibitively ill-posed (Zhao et al., 2021) . Thereby, NeuroFluid can only overfit one sequence every time and cannot learn fluid dynamics. (Li et al., 2021b ) develops a similar framework, while it learns dynamics implicitly on latent space and represents the whole 3D environment using only a single vector. Such a rough and implicit representation limits its ability for generalization. We make detailed discussions and comparisons with (Guan et al., 2022; Li et al., 2021b) in Section 4.4. To explicitly address these fundamental limitations, in this work, we propose an end-to-end/coherent differentiable framework integrating simulation, reconstruction (3D modeling) and rendering, with image/video data as supervision, aiming at learning fluid dynamic models from videos. In a nutshell, a fluid volume augmented representation is adopted to facilitate reconstruction-dynamics linkage between simulation and rendering. More concretely, we design a neural simulator in the Eulerian view, where 3D girds are constructed to save the velocity and volume of fluid. In addition to advection and external forces application, we develop a ConvNet-based pressure projection solver for differentiable velocity field updating. Naturally, the fluid volume field is updated using advection based on the updated velocity field. Note that the advection is an efficient and differentiable operator that involves back-tracing and interpolation. In the meantime, as the Euler 3D grid saves the volume of fluid, a NeRF-like neural renderer is thus proposed to capture the geometry/shape information of fluid which retrieves supervision signals from images taken from multiple views. Specifically, we assign a fluid volume value to each point sampled from emitted rays by performing trilinear interpolation on the fluid volume field. Then the sampled points equipped with fluid volume properties are sent into the NeRF model to render an image. The fluid volume value of each point provides information (about how much fluid material there is) for the renderer to capture the effects of fluid on its density and radiance. The simple and efficient fluid volume representation not only suits a neural Euler fluid engine for fluid motion field estimation but also supports a differentiable NeRFlike renderer relating image with fluid volume, achieving end-to-end error signal propagation to all rendering, fluid volume reconstruction and simulation modules. Note that the fluid field (Euler) representation naturally provides spatial distribution of fluid and better correspondence between consecutive frames, which greatly reduces ambiguities for inverse optimization. The whole forward simulation-modeling-rendering process and the inverse procedure are shown in Figure 1 . Unlike methods (Niemeyer et al., 2019; Pumarola et al., 2021; Tretschk et al., 2021; Ost et al., 2021; Li et al., 2021a; Guan et al., 2022 ) that typically fit one sequence, our model can be trained on abundant sequences simultaneously and generalizes to unseen sequences with different initial conditions (i.e., initial shape, position, velocity, etc.). Besides, we model and simulate fluid in an explicit and interpretable way, which is different from method (Li et al., 2021b) that learns 3D physical scenes in latent space. As shown in Figure 10 , such an explicit representation way endows our method with a strong ability to render images in extreme views that are far away from our training distribution and perform scene editing efficiently. We conduct experiments on a part of DPI DamBreak dataset (Li et al., 2019) and two datasets that we generate using Mantaflow (Pfaff & Thuerey) and Blender. Various experiments that involve baseline comparison, representation comparison, future prediction, novel view synthesis and scene editing are performed to prove both effectiveness and generalization ability of our method. Detailed ablation studies are conducted to analyze important components and parameters. Upon acceptance, all code and data will be publicly available. We also discuss the limitations of our work in Appendix A.10.

2. RELATED WORK

Fluid Simulation. Fluid simulation is a long-standing research area of great interest in science and engineering disciplines. Various classical algorithms (Chorin, 1968; Stam, 1999; Macklin et al., 2014; Fedkiw et al., 2001; Monaghan, 1994; Solenthaler & Pajarola, 2009; Macklin & Müller, 2013; Bender & Koschier, 2015; Bardenhagen et al., 2000; Zehnder et al., 2018; Ando et al., 2015; Zhang & Bridson, 2014; Brackbill et al., 1988; Jiang et al., 2015; Hu et al., 2018) are proposed to facilitate accurate and fast simulation. To pursue better adaptability, generalization ability and efficiency, learning-based fluid simulation (Li et al., 2019; Ummenhofer et al., 2019; Sanchez-Gonzalez et al., 2020; Pfaff et al., 2021; Tompson et al., 2017; Wiewel et al., 2019; Zhu et al., 2019; Kim et al., 2019; Thuerey et al., 2020; Wandel et al., 2020) has attracted increasing attention in recent years. Most of these works usually bypass solving large-scale partial differential equations (i.e., PDE) via efficient convolution operators. Lagrangian flows (Li et al., 2019; Ummenhofer et al., 2019; Sanchez-Gonzalez et al., 2020) usually model the fluid and rigid body as a set of particles with different material types. Graph neural networks (Sanchez-Gonzalez et al., 2020; Pfaff et al., 2021) are also suitable to solve such problems. Euler flows (Tompson et al., 2017; Wiewel et al., 2019; Kim et al., 2019) divide the space into regular grids and save physical quantities (e.g., density, mass, volume, velocity) of material in the divided grids. (Tompson et al., 2017) accelerates Euler fluid simulation using a convolution network to solve pressure projection. (Wiewel et al., 2019) encodes the pressure field into a latent space and designs a LSTM-based network to predict the latent code

