FREQUENCY-AWARE INTERFACE DYNAMICS WITH GENERATIVE ADVERSARIAL NETWORKS

Abstract

We present a new method for reconstructing and refining complex surfaces based on physical simulations. Taking a roughly approximated simulation as input, our method infers corresponding spatial details while taking into account how they evolve over time. We consider this problem in terms of spatial and temporal frequencies, and leverage generative adversarial networks to learn the desired spatiotemporal signal for the surface dynamics. Furthermore, we investigate the possibility to train our network in an unsupervised manner, i.e. without predefined training pairs. We highlight the capabilities of our method with a set of synthetic wave function tests and complex 3D dynamics of elasto-plastic materials.

1. INTRODUCTION

Complex and chaotic physical phenomena such as liquids, gels and goo are still very challenging when it comes to representing them as detailed and realistically as possible. A variety of numerical methods have been proposed to simulate such materials, from purely Eulerian methods (Harlow & Welch, 1965; Stam, 1999) , over particle based methods (Gingold & Monaghan, 1977; Ihmsen et al., 2014) , to hybrids (Zhu & Bridson, 2005; Stomakhin et al., 2013) . Such simulations have also been targeted with deep learning methods (Tompson et al., 2017; Mrowca et al., 2018; Li et al., 2019) , but despite significant advances, they remain very time-consuming and highly challenging to solve. One approach to speed up the necessary calculations and to allow for more control is to employ super-sampling. This can be seen as a form of post-processing where one simulates only a lowresolution simulation and uses an up-sampling technique to approximate the behavior of a highresolution simulation. Neural networks are of special interest here because of their capability to efficiently approximate the strongly nonlinear behavior of physical simulations. Applying neural networks to space-time data sets of physical simulations has seen strongly growing interest in recent years (Ladicky et al., 2015; Kim et al., 2020) , and is particularly interesting in this context to incorporate additional constraints, e.g., for temporal coherence (Xie et al., 2018) , or for physical plausibility (Tompson et al., 2017; Kim et al., 2019) . An important aspect here is that methods based on simple distance losses, such as mean square errors, quickly reach their limits. The generated data tends to be smooth without the necessary small-scale features. Generative adversarial networks (GANs) have been proposed to overcome this issue (Goodfellow, 2016) . They are characterized by the fact that, apart from a generative network, they also make use of a discriminator that classifies the results of the generator with respect to the ground-truth data. Via a joint training, the distribution of solutions of the generator is guided to approximate the ground-truth data distribution. As the quality of the results is primarily determined by the discriminator network, it remains an open problem to accurately evaluate the quality of the inferred results. In our work we propose to evaluate the problem in the Fourier space. In this way, we are able to evaluate the given methods reliably, and it allows us to design improved learning algorithms that more faithfully recover the small scale details of the reference data. For the core of our method, we build on an existing GAN-based architecture that employs two discriminator networks, one for the spatial and one for the temporal behaviour (Xie et al., 2018) . In terms of ground truth data, we focus on multi-phase (solid-fluid-air) interactions with a sharp fluid-air interface. Unlike single-phase flow whose details are visible and relevant solely due to transparency throughout the volume, the details of our data are in most cases only visible on the surface. Of course, the internal dynamics in the volume also play a role, but they are mostly hidden from the viewer, only the effects on the surface are visible. Furthermore, we consider phenomena that build up and take place over the course of several frames. Thus, as we will outline below, we employ a recurrent approach that is conditioned on a previous output in order to produce the solution for a subsequent timestep. In order to represent and process fine details, we treat such detail as high-frequency displacements of a low-frequency surface, and correspondingly formulate the problem in Fourier space. The transformation into Fourier space yields an isolated view of the individual frequencies, and thus allows for a much improved analysis of the results achieved by different methods. E.g., it robustly identifies the strong smoothing behavior of L 2 metrics, and can detect mode collapse problems of adversarial training runs. We also demonstrate how frequency information can be incorporated into the learning objective in order to improve results. To summarize, the central contributions of our work are: (1) A method for frequency evaluation with a consideration of spatial properties, (2) A novel frequency aware loss formulation, (3) A simple, yet intuitive evaluation of different generative methods, (4) A time consistent spatio-temporal upsampling of complex physical surfaces. Related Work Deep learning methods in conjunction with physical models were employed in variety of contexts, ranging from learning models for physical intuition (Battaglia et al., 2016; Sanchez-Gonzalez et al., 2018) , over robotic control (Schenck & Fox, 2018; Hu et al., 2019) to engineering applications (Ling et al., 2016; Morton et al., 2018) . In the following, we focus on fluid-like materials with continuous descriptions, which encompass a wide range of behavior and pose challenging tasks for learning methods (Mrowca et al., 2018; Li et al., 2019) . For fluid flows in particular, a variety of learning methods were proposed (Tompson et al., 2017; Prantl et al., 2017; Um et al., 2018) . A common approach to reduce the high computational cost of a simulation is to employ super-resolution techniques (Dong et al., 2016; Chu & Thuerey, 2017; Bai et al., 2019) . In this context, our work targets the up-sampling for physics-based animations, for which we leverage the approach proposed by Xie et al. (2018) . However, in contrast to this work, we target phenomena with clear interfaces, which motivates the frequency-based viewpoint of our work. For sharp interfaces, Lagrangian models are a very popular discretization of continuum mechanical systems. E.g., smoothed particle hydrodynamics (SPH) (Gingold & Monaghan, 1977; Koschier et al., 2019) is a widely-used particle-based simulation method. While points and particles are likewise frequently used representations for physical deep learning (Li et al., 2019; Ummenhofer et al., 2019; Sanchez-Gonzalez et al., 2020) , Eulerian, i.e., grid-based representations offer advantages in terms of efficient and robust kernel evaluations. We employ generative adversarial networks (Goodfellow, 2016) , as a powerful and established method for learning generative models. Here, "unconditional" GANs typically rely on a synthetic input vector from Gaussian noise to produce the desired output distribution, e.g., the DC-GAN approach (Radford et al., 2016) . Conditional GANs (Mirza & Osindero, 2014) were introduced to provide the network with an input that allows the neural network to steer the generation of the output. Hence super-resolution tasks for natural images (Ledig et al., 2016) , or image translation tasks (Isola et al., 2017 ) employ conditional GANs. The time dimension was also taken into account in natural imaging works, e.g., by Saito et al. in the form of a temporal generator (Saito et al., 2017) , or via a stochastic sequence generator (Yu et al., 2017) . Other works have included direct L 2 loss terms as temporal regularizers (Bhattacharjee & Das, 2017; Chen et al., 2017) , which, however, typically strongly restricts the changes over time. Similar to flow advection, video networks also often use warping information to align data over time (Liu et al., 2017; de Bezenac et al., 2017) . We will demonstrate that recurrent architectures similar to those used for video super-resolution (Sajjadi et al., 2018) are likewise very amenable for physical problems over time.

2. METHOD

The input for our method is a coarsely approximated source simulation, with the learning objective to infer the surface of a target simulation over space and time. This target is typically computed via a potentially very costly, finely resolved simulation run for the same physical setup. When it comes to the possibilities of simulation representations, there is a great variance. In our case we have chosen an implicit representation of the data, by a signed-distance field (SDF) denoted by g : R 3 → R.

