EAGLE: LARGE-SCALE LEARNING OF TURBULENT FLUID DYNAMICS WITH MESH TRANSFORMERS

Abstract

Estimating fluid dynamics is classically done through the simulation and integration of numerical models solving the Navier-Stokes equations, which is computationally complex and time-consuming even on high-end hardware. This is a notoriously hard problem to solve, which has recently been addressed with machine learning, in particular graph neural networks (GNN) and variants trained and evaluated on datasets of static objects in static scenes with fixed geometry. We attempt to go beyond existing work in complexity and introduce a new model, method and benchmark. We propose EAGLE, a large-scale dataset of ∼1.1 million 2D meshes resulting from simulations of unsteady fluid dynamics caused by a moving flow source interacting with nonlinear scene structure, comprised of 600 different scenes of three different types. To perform future forecasting of pressure and velocity on the challenging EAGLE dataset, we introduce a new mesh transformer. It leverages node clustering, graph pooling and global attention to learn long-range dependencies between spatially distant data points without needing a large number of iterations, as existing GNN methods do. We show that our transformer outperforms state-of-the-art performance on, both, existing synthetic and real datasets and on EAGLE. Finally, we highlight that our approach learns to attend to airflow, integrating complex information in a single iteration.

1. INTRODUCTION

Despite consistently being at the center of attention of mathematics and computational physics, solving the Navier-Stokes equations governing fluid mechanics remains an open problem. In the absence of an analytical solution, fluid simulations are obtained by spatially and temporally discretizing differential equations, for instance with the finite volume or finite elements method. These simulations are computationally intensive, take up to several weeks for complex problems and require expert configurations of numerical solvers. Neural network-based physics simulators may represent a convenient substitute in many ways. Beyond the expected speed gain, their differentiability would allow for direct optimization of fluid mechanics problems (airplane profiles, turbulence resistance, etc.), opening the way to replace traditional trial-and-error approaches. They would also be an alternative for solving complex PDEs where numerical resolution is intractable. Yet, the development of such models is slowed down by the difficulty of collecting data in sufficient quantities to reach generalization. Velocity and pressure field measurements on real world systems require large and expensive devices, and simulation faces the problems described above. For all these reasons, few datasets are freely available for training highcapacity neural networks, and the existing ones either address relatively simple problems which can be simulated in reasonable time and exhibiting very similar behaviors (2D flow on a cylinder, airfoil Figure 1 : We introduce EAGLE, a large-scale dataset for learning complex fluid mechanics, accurately simulating the air flow created by a 2D drone in motion and interacting with scenes of varying 2D geometries. We address the problem through an autoregressive model and self-attention over tokens in a coarser resolution, allowing to integrate long-range dependencies in a single hop -shown in the given example by the attention distributions for ◼, which follows the airflow. (Pfaff et al., 2021; Han et al., 2022) ) , or simulations of very high precision, but limited to a few different samples only (Graham et al., 2016; Wu et al., 2017) . In this paper, we introduce EAGLE, a large-scale dataset for learning unsteady fluid mechanics. We accurately simulate the airflow produced by a two-dimensional unmanned aerial vehicle (UAV) moving in 2D environments with different boundary geometries. This choice has several benefits. It models the complex ground effect turbulence generated by the airflow of an UAV following a control law, and, up to our knowledge, is thus significantly more challenging than existing datasets. It leads to highly turbulent and non-periodic eddies, and high flow variety, as the different scene geometries generate completely different outcomes. At the same time, the restriction to a 2D scene (similar to existing datasets) makes the problem manageable and allows for large-scale amounts of simulations (∼1.1m meshes). The dataset will be made publically available upon publication. As a second contribution, we propose a new multi-scale attention-based model, which circumvents the quadratic complexity of multi-head attention by projecting the mesh onto a learned coarser representation yielding fewer but more expressive nodes. Conversely to standard approaches based on graph neural networks, we show that our model dynamically adapts to the airflow in the scene by focusing attention not only locally, but also over larger distances. More importantly, attention for specific heads seems to align with the predicted airflow, providing evidence of the capacity of the model to integrate long range dependencies in a single hop -see Figure 1 . We evaluate the method on several datasets and achieve state-of-the-art performance on two public fluid mechanics datasets (Cylinder-Flow, (Pfaff et al., 2021) and Scalar-Flow (Eckert et al., 2019) ), and on EAGLE.

2. RELATED WORK

Fluids datasets for deep learning -are challenging to produce in many ways. Real world measurement is complicated, requiring complex velocimetry devices (Wang et al., 2020; Discetti & Coletti, 2018; Erichson et al., 2020) . Remarkably, (Eckert et al., 2019; De Bézenac et al., 2019) leverage alignment with numerical simulation to extrapolate precise GT flows on real world phenomena (smoke clouds and sea surface temperature). Fortunately, accurate simulation data can by acquired through several solvers, ranging from computer graphics-oriented simulators (Takahashi et al., 2021; Pfaff & Thuerey, 2016) to accurate computational fluid dynamics solver (OpenFOAM © , Ansys © Fluent, ...). A large body of work (Chen et al., 2021a; Pfaff et al., 2021; Han et al., 2022; Stachenfeld et al., 2021; Pfaff et al., 2021) introduces synthetic datasets limited to simple tasks, such as 2D flow past a cylinder. EAGLE falls into this synthetic category, but differs in two main points: (a) simulations rely on hundreds of procedurally generated scene configurations, requiring several weeks of calculations on a high-performance computer, and (b) we used an engineer-grade fluid solver with demanding turbulence model and a fine domain discretization. For a comparison, see table 1 . Learning of fluid dynamics -is mainly addressed with message passing networks. Recent work focuses in particular on smoothed-particle hydrodynamics (SPH) (Shao et al., 2022; Ummenhofer et al., 2020; Shlomi et al., 2021; Li et al., 2019; Allen et al., 2022) , somehow related to a Lagrangian representation of fluids. Sanchez-Gonzalez et al. (2020) proposes to chain graph neural networks in an Encode-Process-Decode pipeline to learn interactions between particles. SPH simulations are The proximity to images makes uniform grids appealing, which lead to the usage of convolutional networks for simulation and learning (De Bézenac et al., 2019; Ravuri et al., 2021; Ren et al., 2022; Liu et al., 2022; Le Guen & Thome, 2020) . For instance, Stachenfeld et al. (2021) takes the principles introduced in Sanchez-Gonzalez et al. ( 2020) applied to uniform grids for the prediction of turbulent phenomena. However, uniform grids suffer from limitations that hinder their generalized use: they adapt poorly to complex geometries, especially strongly curved spatial domains and their spatial resolution is fixed, requiring a large number of cells for a given precision. Deep Learning on non-uniform meshes -are a convenient way of solving the issues raised by uniform grids. Nodes can then be sparser in some areas and denser in areas of interest. Graph networks (Battaglia et al., 2016) are well suited for this type of structure. The task was notably introduced in Pfaff et al. (2021) with MeshGraphNet, an Encode-Process-Decode pipeline solving mesh-based physics problems. (Lienen & Günnemann, 2022) introduced a graph network structure algorithmically aligned with the finite element method and show good performances on several public datasets. Close to our work, Han et al. (2022) leverages temporal attention mechanism on a coarser mesh to enhance forecasting accuracy over longer horizon. In contrast, our model is based on a spatial transformer, allowing a node to communicate not only with its neighbors but also over greater distances by dynamically adapting attention to the airflow. EAGLE is comprised of fine-grained fluid simulations defined on irregular triangle meshes, which we argue is more suited to a broader range of applications than regular grids and thus more representative of industrial standards. , 2007) and seems to transfer well to simulators based on machine learning (Pfaff et al. (2021) ). However, using triangular meshes with neural networks is not as straightforward as regular grids. Geometric deep learning (Bronstein et al., 2021) and graph networks (Battaglia et al., 2018) have established known baselines but this remains an active domain of research. Existing datasets focus on well-studied tasks such as the flow past an object (Chen et al., 2021a; Pfaff et al., 2021) or turbulent flow on an airfoil (Thuerey et al., 2020; Sekar et al., 2019) . These are well studied problems, for some of which analytical solutions exist, and they rely on a large body of work from the physics community. However, the generated flows, while being turbulent, are merely steady or periodic despite variations in the geometry. With EAGLE, we propose a complex task, with convoluted, unsteady and turbulent air flow with minimal resemblance across each simulation.

3. THE EAGLE DATASET AND BENCHMARK

Purpose -we built EAGLE in order to meet a growing need for a fluid mechanics dataset in accordance with the methods used in engineering, i.e. reasoning on irregular meshes. To significantly increase the complexity of the simulations compared to existing datasets, we propose a proxy task consisting in studying the airflow produced by a dynamically moving UAV in many scenes with variable geometry. This is motivated by the highly non-steady turbulent outcomes that this task generates, yielding challenging airflow to be forecasted. Particular attention has also been paid to the practical usability of EAGLE with respect to the state-of-the-art in future forecasting of fluid dynamics by controlling the number of mesh points, and limiting confounders variables to a moderate amount (i.e. scene geometry and drone trajectory). Simulation and task definition -we simulate the complex airflow generated by a 2D unmanned aerial vehicle maneuvering in 2D scenes with varying floor profile. While the scene geometry varies, the UAV trajectory is constant: the UAV starts in the center of the scene and navigates, hovering near the floor surface. During the flight, the two propellers generate high-paced air flows interacting with each other and with the structure of the scene, causing convoluted turbulence. To produce a wide variety of different outcomes, we procedurally generate a large number of floor profiles by interpolating a set of randomly sampled points within a certain range. The choice of interpolation order induces drastically different floor profiles, and therefore distinct outcomes from one simulation to another. EAGLE contains three main types of geometry depending on the type of interpolation (see Figure 2 ): (i) Step: surface points are connected using step functions (zero-order interpolation), which produces very stiff angles with drastic changes of the air flow when the UAV hovers over a step. (ii) Triangular: surface points are connected using linear functions (first-order interpolation), causing the appearance of many small vortices at different location in the scene. (iii) Spline: surface points are connected using spline functions with smooth boundary, causing long and fast trails of air, occasionally generating complex vortices. EAGLE contains about 600 different geometries (200 geometries of each type) corresponding to roughly 1,200 flight simulations (one geometry gives two flight simulations depending on whether the drone is going to the right or to the left of the scene), performed at 30 fps over 33 seconds, resulting in 990 time steps per simulation. Physically plausible UAV trajectories are obtained through MPC control of a (flow agnostic) dynamical system we design for a 2D drone. More details and statistics are available in appendix A. We simulated the temporal evolution of the velocity field as well as the pressure field (both static and dynamic) defined over the entire domain. Due to source motion, the triangle mesh on which these fields are defined need to be dynamically adapted to the evolving scene geometry. More formally, the mesh is a valued dynamical graph  𝑡 = (  𝑡 ,  𝑡 ,  𝑡 ,  𝑡 ) where  is the set of nodes,  the edges,  is a field of velocity vectors and  is a field of scalar pressure values. Both physical quantities are expressed at node level. Note that the dynamical mesh is completely flow-agnostic, thus no information about the flow can be extrapolated directly from the future node positions. Time dependency will be omitted when possible for sake of readability. Numerical simulations -were carried out using the software Ansys © Fluent, which solves the Reynolds Averaged Navier-Stokes equations of the Reynolds stress model. It uses five equations to model turbulence, more accurate than standard 𝑘-𝜖 or 𝑘-𝜔 models (two equations). This resulted in Mesh M(t) < l a t e x i t s h a 1 _ b a s e 6 4 = " r E  u o L k y U r 1 T R s O Y r 2 p N J C e G A y w M = " > A A A C 4 H i c j V H L S s N A F D 3 G 9 7 v q 0 k 2 w C L o p i Q q 6 F N 2 4 E R S s F l q R y X T a D s 2 L Z C J K 6 M K d O 3 H r D 7 j V r x H / Q P / C O 2 M K a h G d k O T M u f e c m X u v F / s y V Y 7 z O m Q N j 4 y O j U 9 M T k 3 P z M 7 N l x Y W T 9 M o S 7 i o 8 s i P k p r H U u H L U F S V V L 6 o x Y l g g e e L M 6 + 7 r + N n l y J J Z R S e q O t Y n A e s H c q W 5 E w R d V F a b i h x p f J D k X b s n t 0 I m O p w 5 u e H v T W 1 f l E q O x X H L H s Q u A U o o 1 h H U e k F D T Q R g S N D A I E Q i r A P h p S e O l w 4 i I k 7 R 0 5 c Q k i a u E A P U 6 T N K E t Q B i O 2 S = " > A A A C x n i c j V H L T s J A F D 3 U F + I L d e m m k Z i 4 I i 2 a 6 J L o h i V G Q R I k p C 0 D T i h t M 5 1 K C D H x B 9 z q p x n / Q P / C O + O Q q M T o N G 3 P n H v P m b n 3 + k n I U + k 4 r z l r Y X F p e S W / W l h b 3 9 j c K m 7 v N N M 4 E w F r B H E Y i 5 b v p S z k E W t I L k P W S g T z R n 7 I r v 3 h u Y p f 3 z G R 8 j i 6 k p O E d U b e I O J 9 H n i S q M t x l 3 e L J a f s 6 G X P A 9 e A E s y q x 8 U X 3 K C H G A E y j M A Q Q R I O 4 S G l p w 0 X D h L i O p g S J w h x H W e 4 R 4 G 0 G W U x y v C I H d J 3 Q L u 2 Y S P a K 8 9 U q w M 6 J a R X k N L G A W l i y h O E 1 W m 2 j m f a W b G / e U + 1 p 7 r b h P 6 + 8 R o R K 3 F L 7 F + 6 W e Z / d a o W i T 5 O d Q 2 c a k o 0 o 6 o L j E u m u 6 J u b n + p S p J D Q p z C P Y o L w o F W z v p s a 0 2 q a 1 e 9 9 X T 8 T W c q V u 0 D k 5 v h X d 2 S B u z + H O c 8 a F b K 7 l G 5 c n F c q p 6 Z U e e x h 3 0 c 0 j x P U E U N d T T I e 4 B H P O H Z q l m R l V n j z 1 Q r Z z S 7 + L a s h w + R 9 5 B c < / l a t e x i t > w j < l a t e x i t s h a 1 _ b a s e 6 4 = " h R V p t W b A 1 Z 4 a q 8 7 g f P V 0 L G t K Z 5 I = " > A A A C x n i c j V H L S s N A F D 2 N r 1 p f V Z d u g k V w V Z I q 6 L L o p s u K 9 g G 1 l C S d 1 r F 5 M Z l Y S h d C + Y E r s 9 a 7 u h c x V t 3 T C Q 8 C q / k J G b d w B m G f M A 9 R x J 1 O e 7 d 9 o o l q 2 z p Z c 4 D O w M l Z K s e F V 9 w j T 4 i e E g R g C G E J O z D Q U J P B z Y s x M R 1 M S V O E O I 6 z n C P A m l T y m K U 4 R A 7 o u + Q d p 2 M D W m v P B O t 9 u g U n 1 5 B S h M H p I k o T x B W p 5 k 6 n m p n x f 7 m P d W e 6 m 4 T + r u Z V 0 C s x A 2 x f + l m m f / V q V o k B j j V N X C q K d a M q s 7 L X F L d F X V z 8 0 t V k h x i 4 h T u U 1 w Q 9 r R y 1 m d T a x J d u + q t o + N v O l O x a u 9 l u S n e 1 S 1 p w P b P c c 6 D Z q V s H 5 U r F 8 e l 6 l k 2 6 j z 2 s I 9 D m u c J q q i h j g Z 5 D / G I J z w b N S M 0 U m P 8 m W r k M s 0 u v i 3 j 4 Q O U V 5 B d < / l a t e x i t > Attention s 0 i < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 o l y l P D Z Y V J Y K 4 3 J S B 7 Q U 2 O s p G U = " > A A A C x 3 i c j V H L S s N A F D 2 N r 1 p f V Z d u g k V 0 V Z I q 6 L L o R n c V 7 A N q K U k 6 b Y f m R W Z S L M W F P + B W / 0 z 8 A / 0 L 7 4 w p q E V 0 Q p I z 5 9 5 z Z u 6 9 b u x z I S 3 r N W c s L C 4 t r + R X C 2 v r G 5 t b x e 2 d h o j S x G N 1 L / K j p O U 6 g v k 8 Z H X J p c 9 a c c K c w P V Z 0 x 1 d q H h z z B L B o / B G T m L W C Z x B y P v c c 6 S i R J c f d o s l q 2 z p Z c 4 D O w M l Z K s W F V 9 w i x 4 i e E g R g C G E J O z D g a C n D R s W Y u I 6 m B K X E O I 6 z n C P A m l T y m K U 4 R A 7 o u + A d u 2 M D W m v P I V W e 3 S K T 2 9 C S h M H p I k o L y G s T j N 1 P N X O i v 3 N e 6 o 9 1 d 0 m 9 H c z r 4 B Y i S G x f + l m m f / V q V o k + j j T N X C q K d a M q s 7 L X F L d F X V z 8 0 t V k h x i 4 h T u U T w h 7 G n l r M + m 1 g h d u + q t o + N v O l O x a u 9 l u S n e 1 S 1 p w P b P c c 6 D R q V s H 5 c r 1 y e l 6 n k 2 6 j z 2 s I 8 j m u c p q r h E D X X y H u I R T 3 g 2 r o z I G B t 3 n 6 l G L t P s 4 t s y H j 4 A D B 2 Q i Q = = < / l a t e x i t > Decoder / Up Sampling Timestep t < l a t e x i t s h a 1 _ b a s e 6 4 = " m / + 2 W / V h 8 y F 7 6 w o n Z / T g f X K A X t 0 = " > A A A C 1 3 i c j V H L T s J A F D 3 U F + K r 4 t J N I z F x R Q q a 6 J L o x i U m v A w Q 0 p Y B J / S V d m o g h L g z b v 0 B t / p H x j / Q v / D O W B J d E L l N 2 z v n n n P m 3 h k 7 d H k s T P M j o 6 2 s r q 1 v Z D d z W 9 s 7 u 3 v 6 f r 4 R B 0 n k s L o T u E H U s q 2 Y u d x n d c G F y 1 p h x C z P d l n T H l 3 J e v O e R T E P / J q Y h K z r W U O f D 7 h j C Y J 6 e r 4 j 2 F h M a 9 x j s W C h M T N E T y + U i q Y K Y 3 F S Q B r V Q H 9 H B 3 0 E c J D A A 4 M P Q b k L C z E 9 b Z R g I i S s i y l h E W V c 1 R l m y J E 2 I R Y j h k X o i L 5 D W r V T 1 K e 1 9 I y V 2 q F d X H o j U h o 4 J k 1 A v I h y u Z u h 6 o l y l u g i 7 6 n y l L 1 N 6 G + n X h 6 h A n e E / q e b M 5 f V y V k E B r h Q M 3 C a K V S I n M 5 J X R J 1 K r J z 4 9 d U g h x C w m T e p 3 p E u a O U 8 3 M 2 l C Z W s 8 u z t V T 9 U z E l K t d O y k 3 w J b t c 7 o I b 5 W L p t F i + O S t U L t O r z u I Q R z i h + z x H B d e o o k 7 e Y 7 z g F W / a r f a g P W p P P 1 Q t k 2 o O 8 C e 0 5 2 + F S J b / < / l a t e x i t > Timestep t + 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " t v X v P F u M 0 m k S D + X q L f a 3 F r o w F D I = " > A A A C 2 X i c j V H L S s N A F D 2 N r 1 p f 8 b F z E y y C I J S 0 C r o s u n F Z o S 9 o S 0 n S a R 2 a F 8 l E r K U L d + L W H 3 C r P y T + g f 6 F d 8 Y U d F H s D U n u n H v O m X t n 7 N D l s T D N j 4 y 2 s L i 0 v J J d z a 2 t b 2 x u 6 d s 7 9 T h I I o f V n M A N o q Z t x c z l P q s J L l z W D C N m e b b L G v b w U t Y b t y y K e e B X x S h k H c 8 a + L z P H U s Q 1 N X 3 2 o L d i X G V e y w W L D Q m h j g u d v V 8 s W C q M G Y n e a R R C f R 3 t N F D A A c J P D D 4 E J S 7 s B D T 0 0 I R J k L C O h g T F l H G V Z 1 h g h x p E 2 I x Y l i E D u k 7 o F U r R X 1 a S 8 9 Y q R 3 a x a U 3 I q W B Q 9 I E x I s o l 7 s Z q p 4 o Z 4 n O 8 h 4 r T 9 n b i P 5 2 6 u U R K n B D 6 H + 6 K X N e n Z x F o I 9 z N Q O n m U K F y O m c 1 C V R p y I 7 N 3 5 N J c g h J E z m P a p H l D t K O T 1 n Q 2 l i N b s 8 W 0 v V P x V T o n L t p N w E X 7 L L + S 6 4 X i o U T w q l 6 9 N 8 + S K 9 6 i z 2 c Y A j u s 8 z l H G F C m r k f Y 8 X v O J N a 2 k P 2 q P 2 9 E P V M q l m F 3 9 C e / 4 G u j a X b w = = < / l a t e x i t > Pressure < l a t e x i t s h a 1 _ b a s e 6 4 = " Q O k M V r h t O r w W N 0 q q c / v U R f O q 0 3 g = " > A A A C z H i c j V H L S s N A F D 2 N r 1 p f V Z d u g k V w V Z K 6 0 G X R j S u p Y B 9 S i y T T a Q 3 N i 5 m J U E q 3 / o B b / S 7 x D / Q v v D O m o B b R C U n O n H v P m b n 3 + m k Y S O U 4 r w V r Y X F p e a W 4 W l p b 3 9 j c K m / v t G S S C c a b L A k T 0 f E 9 y c M g 5 k 0 V q J B 3 U s G 9 y A 9 5 2 x + d 6 X j 7 n g s Z J P G V G q e 8 F 3 n D O B g E z F N E X T c E l z I T v H R b r j h V x y x 7 H r g 5 q C B f j a T 8 g h v 0 k Y A h Q w S O G I p w C A + S n i 5 c O E i J 6 2 F C n C A U m D j H F C X S Z p T F K c M j d k T f I e 2 6 O R v T X n t K o 2 Z 0 S k i v I K W N A 9 I k l C c I 6 9 N s E 8 + M s 2 Z / 8 5 4 Y T 3 2 3 M f 3 9 3 C s i V u G O 2 L 9 0 s 8 z / 6 n Q t C g O c m B o C q i k 1 j K 6 O 5 S 6 Z 6 Y q + u f 2 l K k U O K X E a 9 y k u C D O j n P X Z N h p p a t e 9 9 U z 8 z W R q V u 9 Z n p v h X d + S B u z + H O c 8 a N W q 7 l G 1 d l m r 1 E / z U R e x h 3 0 c 0 j y P U c c 5 G m i S d 4 R H P O H Z u r C U N b G m n 6 l W I d f s 4 t u y H j 4 A l 0 m S u g = = < / l a t e x i t > Velocity < l a t e x i t s h a 1 _ b a s e 6 4 = " F h 7 f h 3 G J O / k I 2 u N g 6 + 1 0 w 9 E o q u I = " > A A A C y 3 i c j V H L S s N A F D 2 N r 1 p f V Z d u g k V w V Z K 6 0 G X R j R u h g n 1 A L Z J M p 3 V o X k w m Q q 0 u / Q G 3 + l / i H + h f e G d M Q S 2 i E 5 K c O f e c O 3 P v 9 Z N A p M p x X g v W 3 P z C 4 l J x u b S y u r a + U d 7 c a q V x J h l v s j i I Z c f 3 U h 6 I i D e V U A H v J J J 7 o R / w t j 8 6 0 f H 2 D Z e p i K M L N U 5 4 L / S G k R g I 5 i m i O i 0 e x E y o 8 V W 5 4 l Q d s + x Z 4 O a g g n w 1 4 v I L L t F H D I Y M I T g i K M I B P K T 0 d O H C Q U J c D x P i J C F h 4 h z 3 K J E 3 I x U n h U f s i L 5 D 2 n V z N q K 9 z p k a N 6 N T A n o l O W 3 s k S c m n S S s T 7 N N P D O Z N f t b 7 o n J q e 8 2 p r + f 5 w q J V b g m 9 i / f V P l f n 6 5 F Y Y A j U 4 O g m h L D 6 O p Y n i U z X d E 3 t 7 9 U p S h D Q p z G f Y p L w s w 4 p 3 2 2 j S c 1 t e v e e i b + Z p S a 1 X u W a z O 8 6 1 v S g N 2 f 4 5 w F r V r V P a j W z m u V + n E + 6 i J 2 s I t 9 m u c h 6 j h F A 0 0 z x 0 c 8 4 d k 6 s 1 L r 1 r r 7 l F q F 3 L O N b 8 t 6 + A A + S 5 K c < / l a t e x i t > vi, eij < l a t e x i t s h a 1 _ b a s e 6 4 = " m X v G s 4 t R W Y n b j + S H 9 X 8 / F a i A Q a A = " > A A A C z n i c j V H L S s N A F D 2 N r 1 p f V Z d u g k V w I S W p g i 6 L b l x W s A + o p S T p t I 7 N i 2 R S K K W 4 9 Q f c 6 m e J f 6 B / 4 Z 1 x C m o R n Z D k z L n n 3 J l 7 r x v 7 P B W W 9 Z o z F h a X l l f y q 4 W 1 9 Y 3 N r e L 2 T i O N s s R j d S / y o 6 T l O i n z e c j q g g u f t e K E O Y H r s 6 Y 7 v J D x 5 o g l K Y / C a z G O W S d w B i H v c 8 8 R R L V H X X 5 k s u 6 E 3 0 2 7 x Z J V t t Q y 5 4 G t Q Q l 6 1 a L i C 2 7 Q Q w Q P G Q I w h B C E f T h I 6 W n D h o W Y u A 4 m x C W E u I o z T F E g b 0 Y q R g q H 2 C F 9 B 7 R r a z a k v c y Z K r d H p / j 0 J u Q 0 c U C e i H Q J Y X m a q e K Z y i z Z 3 3 J P V E 5 5 t z H 9 X Z 0 r I F b g l t i / f D P l f 3 2 y F o E + z l Q N n G q K F S O r 8 3 S W T H V F 3 t z 8 U p W g D D F x E v c o n h D 2 l H P W Z 1 N 5 U l W 7 7 K 2 j 4 m 9 K K V m 5 9 7 Q 2 w 7 u 8 J Q 3 Y / j n O e d C o l O 3 j c u X q p F Q 9 1 6 P O Y w / 7 O K R 5 n q K K S 9 R Q V x 1 / x B O e j Z o x M q b G / a f U y G n P L r 4 t 4 + E D m I u T h g = = < / l a t e x i t > Figure 3 : The mesh transformer encodes the input mesh node values (positions, pressure and velocity), reduces the spatial resolution through clustering + graph pooling, and performs multi-head self-attention on the coarser level of cluster centers. A decoder upsamples the token embeddings to the original resolution and predicts pressure and velocity at time step 𝑡 + 1. 3.9TB of raw data with ∼162,760 control points per mesh. We down-sampled this to 3,388 points in average, and compressed it to 270GB. Details and illustrations are given in appendix A. Task -for what follows, we define 𝑥 𝑖 as the 2D position of node 𝑖, 𝑣 𝑖 its velocity, 𝑝 𝑖 pressure and 𝑛 𝑖 is the node type, which indicates if the node belongs to a wall, an input or an output boundary. We are interested in the following task: given the complete simulation state at time 𝑡, namely  𝑡 , as well as future mesh geometry  𝑡+ℎ ,  𝑡+ℎ , forecast the future velocity and pressure fields 𝑡+ℎ , 𝑡+ℎ , i.e. for all positions 𝑖 we predict v𝑡+ℎ 𝑖 , p𝑡+ℎ 𝑖 over a horizon ℎ. Importantly, we consider the dynamical re-meshing step  𝑡 →  𝑡+ℎ to be known during inference and thus is not required to be forecasted.

4. LEARNING UNSTEADY AIRFLOW

Accurate flow estimations require data on a certain minimum spatial and temporal scale. Deviations from optimal resolutions, i.e. data sampled with lower spatial resolutions or lower frame rates, are typically very hard to compensate through models of higher complexity, in particular when the estimation is carried out through numerical simulations with an analytical model. The premise of our work is that machine learning can compensate loss in resolution by picking up longer rate regularities in the data, trading data resolution for complexity in the modeled interactions. Predicting the outcome for a given mesh position may therefore require information from a larger neighborhood, whose size can depend on factors like resolution, compressibility, Reynolds number etc. Regularities and interactions on meshes and graphs have classically been modeled with probabilistic graphical models (MRFs (Geman & Geman, 1984) , CRFs (Lafferty et al., 2001) , RBMs (Smolensky, 1986) etc.), and in the DL era through geometric DL (Bronstein et al., 2021) and graph networks (Battaglia et al., 2018) , or through deep energy-based models. These models can capture long-range dependencies between distant nodes, but need to exploit them through multiple iterations. In this work we argue for the benefits of transformers and self-attention Vaswani et al. (2017) , which in principle are capable of integrating long-range interactions in a single step. However, the quadratic complexity of transformers in terms of number of tokens makes its direct application to large meshes expensive. While low-complexity variants do exist, e.g. (Katharopoulos et al., 2020) , we propose a different Ansatz, shown in Figure 3 : we propose to combine graph clustering and learned graph pooling to perform full attention on a coarser scale with higher-dimensional node embedding. This allows the dot-product similarity of the transformer model -which is at the heart of the crucial attention operations -to operate on a semantic representation instead of on raw input signals, similar to the settings in other applications. In NLP, attention typically operates on word embeddings Vaswani et al. (2017) , and in vision either on patch embeddings Dosovitskiy et al. (2021) or on convolutional feature map cells Wang et al. (2018) . In the sequel, we present the main modules of our model; further details are given in appendix B Offline Clustering -we down-scale mesh resolution through geometric clustering, which is independent of the forecasting operations and therefore pre-computed offline. A modified k-means clustering is applied to the vertices  𝑡 of each time step and creates clusters with a constant number of nodes, details are given in appendix B.1. The advantages are twofold: (a) the irregularity and adaptive resolution of the original mesh is preserved, as high density region will require more clusters, and (b) constant cluster sizes facilitate parallelization and allow to speed up computations. In what follows, let  𝑘 be the 𝑘 𝑡ℎ cluster computed on mesh  𝑡 . Encoder -the initial mesh  𝑡 is converted into a graph  using the encoder in Pfaff et al. (2021) . More precisely, node and edge features are computed using MLPs 𝜙 node and 𝜙 edge , giving 𝜂 1 𝑖 = 𝜙 node (𝑣 𝑖 , 𝑝 𝑖 , 𝑛 𝑖 ), 𝑒 1 𝑖𝑗 = 𝜙 edge (𝑥 𝑖 -𝑥 𝑗 , ‖𝑥 𝑖 -𝑥 𝑗 ‖). The encoder also computes an appropriate positional encoding based upon spectral projection 𝐹 (𝑥). We also leverage the local position of each node in its cluster. Let x𝑘 be the barycenter of cluster  𝑘 , then the local encoding of node 𝑖 belonging to cluster 𝑘 is the concatenation 𝑓 𝑖 = [𝐹 (𝑥 𝑖 ) 𝐹 ( x𝑘 -𝑥 𝑖 )] 𝑇 . Finally, a series of 𝐿 Graph Neural Networks (GNN) extracts local features through message passing: 𝑒 𝓁+1 𝑖𝑗 =𝑒 𝓁 𝑖𝑗 + 𝜓 𝓁 edge ( [ 𝜂 𝓁 𝑖 𝑓 𝑖 ] , [ 𝜂 𝓁 𝑗 𝑓 𝑗 ] , 𝑒 𝓁 𝑖𝑗 ) ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ 𝜀 𝑖𝑗 , 𝜂 𝓁+1 𝑖 =𝜂 𝓁 𝑖 + 𝜓 𝓁 node ( [ 𝜂 𝓁 𝑖 𝑓 𝑖 ] , ∑ 𝑗 𝜀 𝑖𝑗 ) . ( ) The superscript 𝓁 indicates the layer, and 𝜓 𝓁 node and 𝜓 𝓁 edge are MLPs which encode nodes and edges, respectively. The exact architecture hyper-parameters are given in appendix B. For the sake of readability, in whats follows, we will note 𝜂 𝑖 = 𝜂 𝐿 𝑖 and 𝑒 𝑖𝑗 = 𝑒 𝐿 𝑖𝑗 . Graph Pooling -summarizes the state of the nodes of the same cluster  𝑘 in a single highdimensional embedding 𝑤 𝑘 on which the main neural processor will reason. This is performed with a Gated Recurrent Unit (GRU) Cho et al. (2014) where the individual nodes are integrated sequentially in a random order. This allows to learn a more complex integration of features than a sum. Given an inital GRU state ℎ 0 = 0, node embeddings are integrated iteratively, indicated by superscript 𝑛, ℎ 𝑛+1 𝑘 = GRU( [ 𝜂 𝑖 , 𝑓 𝑖 ] , ℎ 𝑛 𝑘 ), 𝑖 ∈  𝑘 , 𝑤 𝑘 = 𝜙 cluster (ℎ 𝑁 𝑘 ), where 𝑁 = | 𝑘 | and 𝜙 cluster is an MLP. GRU(⋅) denotes the update equations of a GRU, where we omitted gating functions from the notation. The resulting set of cluster embeddings  = 𝑤 𝑘 | 𝑘=1..𝐾 significantly reduces the spatial complexity of the mesh. Attention Module -consists of a transformer with 𝑀 layers of multi-head attention (MHA) Vaswani et al. (2017) working on the embeddings  of the coarse graph. Setting 𝑤 1 𝑘 = 𝑤 𝑘 , we get for layer 𝑚: 𝑤 𝑚+1 𝑘 = MHA ( 𝑄= [ 𝑤 𝑚 𝑘 𝐹 ( x𝑘 ) ] , 𝐾=, = ) , ( ) where Q, K and V are, respectively, the query, key and value mappings of a transformer. We refer to Vaswani et al. (2017) for the details of multi-head attention, denoted as MHA(⋅). Decoder -the output of the attention module is calculated on the coarse scale, one embedding per cluster. The decoder upsamples the representation and outputs the future pressure and velocity field on the original mesh. This upsampling is done by taking the original node embedding 𝜂 𝑖 and concatenating with the cluster embedding 𝑤 𝑀 𝑘 , followed by the application of a GNN, whose role is to take the information produced on a coarser level and correctly distribute it over the nodes 𝑖. To this end, the GNN has access to the positional encoding of the node, which is also concatenated: { v𝑡+1 = 𝑣 𝑡 + 𝛿 𝑣 p𝑡+1 = 𝑝 𝑡 + 𝛿 𝑝 , (𝛿 𝑣 , 𝛿 𝑝 ) = GNN ( [𝜂 𝑖 𝑤 𝑀 𝑘 𝑓 𝑖 ] ) , ( ) where 𝑖 ∈  𝑘 and GNN(⋅) is the graph network variant described in equation (2), parameters are not shared. Our model is trained end-to-end, minimizing the forecasting error over horizon 𝐻 where 𝛼 balances the importance of pressure field over velocity field:  = 𝐻 ∑ 𝑖=1 MSE ( 𝑣(𝑡 + 𝑖), v(𝑡 + 𝑖) ) + 𝛼 𝐻 ∑ 𝑖=1 MSE ( 𝑝(𝑡 + 𝑖), p(𝑡 + 𝑖) ) . ( )

5. EXPERIMENTS

We compare our method against three competing methods for physical reasoning: MeshGraphNet (Pfaff et al., 2021) graph attention transformers (Veličković et al., 2017) . Compared to our mesh transformer, here attention is computed over the one-ring of each node only. DilResNet (DRN) Stachenfeld et al. (2021) differs from the other models as it does not reason over non uniform meshes, but instead uses dilated convolution layers to perform predictions on regular grids. To evaluate this model on EAGLE, we interpolate grid-based simulation over the original mesh, see appendix A.2. During validation and testing, we project airflow back from the grid to the original mesh in order to compute comparable metrics. All baselines have been adapted to the dataset using hyperparameter sweeps, which mostly lead to increases in capacity, explained by EAGLE's complexity. We also compare to two other datasets: Cylinder-Flow (Pfaff et al., 2021) simulates the airflow behind a cylinder with different radius and positions. This setup produces turbulent yet periodic airflow corresponding to Karman vortex. Scalar-Flow (Eckert et al., 2019) contains real world measurements of smoke cloud. This dataset is built using velocimetry measurements combined with numerical simulation aligned with the observations. Following Lienen & Günnemann (2022); Kohl et al. (2020) , we reduce the data to 2D grid-based simulation by averaging along the 𝑥-direction. We evaluate all models reporting the sum of the root mean squared error (N-RMSE) on both pressure and velocity fields, which have been normalized wrt to the training set (centered and reduced), and we provide finer-grained metrics in appendix C.1. Existing datasets -show little success to discriminate the performances of fluid mechanics models (see table 2 ). On Cylinder-Flow, both ours and MeshGraphNet reach near perfect forecasting accuracy. Qualitatively, flow fields are hardly distinguishable from the ground truth at least for the considered horizon (see appendix C.2). As stated in the previous sections, this dataset is a great task to validate fluid simulators, but may be considered as saturated. Scalar-Flow is a much more challenging benchmark, as these real world measurements are limited in resolution and quantity. Our model obtains good quantitative results, especially on a longer horizon, showing robustness to error accumulation during auto-regressive forecasting. Yet, no model achieved visually satisfactory results, the predictions remain blurry and leave room for improvements (cf figure in appendix). dence for the interest in modeling long-range interactions with self-attention. GAT seems to struggle on our challenging dataset. The required increase in capacity was difficult to do for this resource hungry model, we failed even on the 40GB A100 GPUs of a high-end Nvidia DGX.

Comparisons with the

DilResNet shows competitive performances on EAGLE, consistent with the claims of the original paper. However, it fails to predict details of the vortices (cf. Figure 4 ). This model leverages gridbased data, hence was trained on a voxelled simulation, finally projected back on the triangular mesh during testing. This requires precaution in assessment. We try to limit projection error by setting images to contains ten times more pixels than nodes in the actual mesh. Yet, even at that scale, we measure that the reconstruction error represents roughly a third of the final N-RMSE. This points out that grid-based are not suited for complex fluid problem such as EAGLE, which require finer spatial resolution near sensitive areas. We expose failure cases in appendix C.3. Self-attention -is a key feature in our models, as shown in Figure 5b and c , which plots the gradient intensity of a selected predicted point situated on the trail wrt. to all input points, for fixed trained model weights. MeshGraphNet is inherently limited to a neighborhood determined by the number of chained GNNs, the receptive field, which is represented as concentric black circles overlaid over the gradients. In contrast, our model is not spatially limited and can exchange information across the entire scene, even possible in a single step. The gradients show that this liberty is exploited. In the same figure we also show the attention maps, per head and layer, for the selected point near the main trail in Fig- ure 5d . Interestingly, our model discovers to attend not only to the neighborhood (as a GNN would), but also to much farther areas. More importantly, we observe that certain heads explicitly (and dynamically) focus on the airflow, which provides evidence that attention is guided by the regularities in the input data. We released an online tool allowing interactive visualization and exploration of attention and predictions, available at https://eagle-dataset.github.io. Ablation studies -indicate how global attention impacts performance: (a) closer to MeshGraphNet, we replace the attention layers by GNNs operating on the coarser mesh, allowing message passing between nearest cluster only; (b) we limit the receptive field of MHA to the one-ring of the cluster; 10 nodes per cluster < l a t e x i t s h a 1 _ b a s e 6 4 = " S q W E 4 t u 9 N 6 V e Z s w I z 3 V q k c B b 0 6 0  = " > A A A C 4 X i c j V H L S s N A F D 3 G V 3 1 X X e p i s A i u S q K C L k U 3 L i v Y B 1 S R Z D r W 0 D Q J M x O x F D f u 3 I l b f 8 C t / o z 4 B / o X 3 h l T U I v o h C R n z r 3 n z N x 7 g z Q K l X b d 1 x F n d G x 8 Y r I w N T 0 z O z e / U F x c q q k k k 1 x U e R I l s h H 4 S k R h L K o 6 1 J F o p F L t u H O u G 2 H U 1 V + c b h u H 9 K x U w 6 L Y = " > A A A C 4 X i c j V H L S s N A F D 3 G V 3 1 H X e p i s A i u S l o F X R b d u K x g q 1 C L J N N p D a Z J m J m I p X T j z p 2 4 9 Q f c 6 s + I f 6 B / 4 Z 0 x g g 9 E J y Q 5 c + 4 9 Z + b e G 6 R R q L T n P Y 8 4 o 2 P j E 5 O F q e m Z 2 b n 5 B X d x q a G S T H J R 5 0 m U y O P A V y I K Y 1 H X o Y 7 E c S q F 3 w s i c R S c 7 5 n 4 0 Y W Q K k z i Q 9 1 P R a v n d + O w E 3 J f E 3 X q r l Y 8 d q L F p R 6 w O G k L x V I h G Y 8 y p Y U c n r p F r + T Z x X 6 C c g 6 K y F c t c Z 9 w g j Y S c G T o Q S C G J h z B h 6 K n i T I 8 p M S 1 M C B O E g p t X G C I a d J m l C U o w y f 2 n L 5 d 2 j V z N q a 9 8 V R W z e m U i F 5 J S o Z 1 0 i S U J w m b 0 5 i N Z 9 b Z s L 9 5 D 6 y n u V u f / k H u 1 S N W 4 4 z Y v 3 Q f m f / V m V o 0 O t i x N Y R U U 2 o Z U x 3 P X T L b F X N z 9 q k q T Q 4 p c Q a 3 K S 4 J c 6 v 8 6 D O z G m V r N 7 3 1 b f z F Z h r W 7 H m e m + H V 3 J I G X P 4 + z p + g U S m V N 0 u V g 6 1 i d T c f d Q E r W M M G z X M b V e y j h j p 5 X + E e D 3 h 0 u H P t 3 D i 3 7 6 n O S K 5 Z x p f l 3 L 0 B A d W a p Q = = < / l a t e x i t > 30 nodes per cluster < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 w K u v q r I F s Table 3 : Generalization to unseen geometries: we evaluate our model and MeshGraphNet in different setups, evaluating on all geometry types but removing one from training. Our model shows satisfactory generalization, also highlighting the complementarity of each simulation type. A 3 d M r o o J y m I 5 G 6 t 7 E = " > A A A C 4 X i c j V H L S s N A F D 3 G V 6 2 v q k t d D B b B V U l b Q Z d F N y 4 r 2 F p Q K c l 0 1 N A 0 C T M T s Z R u 3 L k T t / 6 A W / 0 Z 8 Q / 0 L 7 w z R l C L 6 I Q k Z 8 6 9 5 8 z c e / 0 k D J R 2 3 Z c x Z 3 x i c m o 6 N 5 O f n Z t f W C w s L T d V n E o u G j w O Y 9 n y P S X C I B I N H e h Q t B I p v J 4 f i i O / u 2 f i R 5 d C q i C O D n U / E a c 9 7 z w K z g L u a a L a h b W q y 0 6 0 u N I D F s U d o V g i J O N h q r S Q w 3 a h 6 J Z c u 9 g o K G e g i G z V 4 8 I z T t B B D I 4 U P Q h E 0 I R D e F D 0 H K M M F w l x p x g Q J w k F N i 4 w R J 6 0 K W U J y v C I 7 d L 3 n H b H G R v R 3 n g q q + Z 0 S k i v J C X D B m l i y p O E z W n M x l P r b N j f v A f W 0 9 y t T 3 8 / 8 + o R q 3 F B 7 F + 6 z 8 z / 6 k w t G m f Y s T U E V F N i G V M d z 1 x S 2 x V z c / a l K k 0 O C X E G d y g u C X O A A C 4 X i c j V H L S s N A F D 3 G V 6 2 v q k t d D B b B V U l r Q Z d F N y 4 r 2 F p Q K c l 0 1 N A 0 C T M T s Z R u 3 L k T t / 6 A W / 0 Z 8 Q / 0 L 7 w z R l C L 6 I Q k Z 8 6 9 5 8 z c e / 0 k D J R 2 3 Z c x Z 3 x i c m o 6 N 5 O f n Z t f W C w s L T d V n E o u G j w O Y 9 n y P S X C I B I N H e h Q t B I p v J 4 f i i O / u 2 f i R 5 d C q i C O D n U / E a c 9 7 z w K z g L u a a L a h b W q y 0 6 0 u N I D F s U d o V g i J O N h q r S Q w 3 a h 6 J Z c u 9 g o K G e g i G z V 4 8 I z T t B B D I 4 U P Q h E 0 I R D e F D 0 H K M M F w l x p x g Q J w k F N i 4 w R J 6 0 K W U J y v C I 7 d L 3 n H b H G R v R 3 n g q q + Z 0 S k i v J C X D B m l i y p O E z W n M x l P r b N j f v A f W 0 9 y t T 3 8 / 8 + o R q 3 F B 7 F + 6 z 8 z / 6 k w t G m f Y s T U E V F N i G V M d z 1 x S 2 x V z c / a l K k 0 O C X E G d y g u C X O r / O w z s x p l a z e 9 9 W z 8 1 W Y a 1 u x 5 l p v i z d y S B l z + O c 5 R 0 K y U y l u l y k G 1 W N v N R p 3 D K t a x S f P c R g 3 7 q K N B 3 t d 4 w C O e H O 7 c O L f O 3 U e q M 5 Z p V v B t O f f v B s u a p w = = < / l a t e x i t > log RMSE (velocity) < l a t e x i t s h a 1 _ b a s e 6 4 = " g c Z Y B E J l m E S V S c c / U e / K + Y V b L T U = " > A A A C 2 H i c j V H L S s N A F D 3 G d 3 1 F X b o J F q F u S l o X u h R F c C P U R 7 X 4 Q J L p W A e n m Z B M h C C C O 3 H r D 7 j V L x L / Q P / C O 2 M K P h C d k O T M u f e c m X t v G E u R a t 9 / 6 X P 6 B w a H h k d G S 2 P j E 5 N T 7 v T M f q q y h P E m U 1 I l r T B I u R Q R b 2 q h J W / F C Q + 6 o e Q H 4 c W 6 i R 9 c 8 i Q V K t r T e c x P u k E n E m e C B Z q o U 3 d W q o 6 3 s 7 W 7 4 V U u u V R M 6 H z x 1 C 3 7 V d 8 u 7 y e o F a C M Y j W U + 4 x j t K H A k K E L j g i a s E S A l J 4 j 1 O A j J u 4 E V 8 Q l h I S N c 1 y j R N q M s j h l B M R e 0 L d D u 6 O C j W h v P F O r Z n S K p D c h p Y c F 0 i j K S w i b 0 z w b z 6 y z Y X / z v r K e 5 m 4 5 / c P C q 0 u s x j m x f + l 6 m f / V m V o 0 z r B i a x B U U 2 w Z U x 0 r X D L b F X N z 7 1 N V m h x i 4 g x u U z w h z K y y 1 2 f P a l J b u + l t Y O O v N t O w Z s + K 3 A x v 5 p Y 0 4 N r 3 c f 4 E + / V q b a l a 3 6 6 X V 9 e K U Y 9 g D v O o 0 D y X s Y p N N N A k 7 x w P e M S T c + j c O L f O 3 U e q 0 1 d o Z v F l O f f v I z y W Z Q = = < / l a t e x i t > (c) we enforce uniform attention by replacing it with an average operation. As shown in table 6 , attention is a key design choice. Disabling attention to distant points has a negative impact on RMSE, indicating that the model leverages efficient long-range dependencies. Agnostic attention to the entire scene is not pertinent either: to be effective, attention needs to dynamically adapt to the predicted airflow. We also conduct a study on generalization to down-sampled meshes in appendix C.4. The role of clustering -is to summarize a set of nodes into a unique feature vector. Arguably, with bigger clusters, more node-wise information must be aggregated in a finite dimensional vector. We indeed observe a slight increase in N-RMSE when the cluster size increases (Figure 7a ). Nonetheless, our model appears to be robust to even aggressive graph clustering as the drop remains limited and still outperforms the baselines. A qualitative illustration is shown figure 7 (left), where we simulate the flow up to 400 time-steps forward and observe the error on a relatively turbulent region. Clustering also acts on the complexity of our model by reducing the number of tokens on which attention is computed. We measure a significant decrease in inference time and number of operations (FLOPs) even when we limit clusters to a small size (Figure 7b and c ). Generalization experiments -highlight the complementarity of the geometry types in EAGLE, since the removal of one geometry in the training set impacts the performances on the others. Mesh-GraphNet suffers the most, resulting in a drop ranging from 10% in average (ablation of Spline) to 67% (ablation of Step). On our model, the performance losses are limited for the ablation of Step and Spline. The most challenging geometry is arguably Triangular, as the ground profile tends to generate more turbulent and convoluted flows.

6. CONCLUSION 1

We presented a new large-scale dataset for deep learning in fluid mechanics. EAGLE contains accurate simulations of the turbulent airflow generated by a flying drone in different scenes. Simulations are unsteady, highly turbulent and defined on dynamic meshes, which represents a real challenge for existing models. To the best of our knowledge, we released the first publicly available dataset of this scale, complexity, precision and variety. We proposed a new model leveraging mesh transformers to efficiently capture long distance dependencies on a coarser scale. Through graph pooling, we show that our model reduces the complexity of multi-head attention and outperforms the competing state-of-the-art on, both, existing datasets and EAGLE. We showed across various ablations and illustration that global attention is a key design choice and observed that the model naturally attends to airflow. Future work will investigate the impact of implicit representations on fluid mechanics, and we discuss the possibility of an extension to 3D data in appendix D. Finally, the edge set  ′ (𝑡) is computed using constrained Delaunay triangulation to prevent triangles to spawn outside of the domain. Once ( (𝑡) ′ , (𝑡) ′ ) has been computed, we evaluate the pressure and velocity field (𝑡), (𝑡) on the nodes by averaging the three nearest points in raw simulation data. We illustrate the final result in figure 8c . Better mesh simplification algorithm exists, notably minimizing the interpolation error, yet such algorithms rely on the simulated flow to compute the mesh, which may embed unwanted biases or shortcuts in the mesh geometry.

A.2 GRID BASED DATASET

One of the baselines, DilResNet (Stachenfeld et al., 2021) , relies on convolutional layers for future forecasting of turbulent flows, and therefore requires projecting EAGLE and Cylinder-Flow on a uniform rectangular grid. However, such a discretization scheme can not adapt its spatial resolution as a function of the geometry of the scene, which therefore constitutes a disadvantage with respect to an irregular triangular mesh. To limit this effect, the resolution of the grid is chosen such that the number of pixels is at least ten times larger than the number of points in the triangular mesh. We project Cylinder-Flow onto a uniform 256 × 64 grid and EAGLE onto a 256 × 128 grid (the dimensions were chosen to respect the height-width ratio of the original data). The value of the pressure and velocity fields at each point in the grid is extrapolated from the nearest point in the raw simulated data. We illustrate this projection in figure 9 . While the grid-based simulation (figure 9b ) seems visually more accurate than the mesh-based simulation (figure 9d ), we observed that the reprojection error (ie. the error obtained after projecting the grid based data onto the triangular mesh) is greater near sensible regions, as for example near the scene boundaries.

B MODEL DETAILS B.1 CLUSTERING

We use our own implementation of the same size Kmeans algorithm described herefoot_1 . Using equally sized clusters has two main advantages : • Areas of high density will be covered by a greater number of clusters, allowing the adaptive resolution of irregular meshes to be preserved on the coarser mesh. • The model can be implemented efficiently, maximizing parallelization, since clusters can be easily stored as batched tensors. Since the clustering depends solely on the geometric properties of the mesh (and not on the prediction of the neural network), it is possible to apply the clustering algorithm as a pre-processing step to reduce the computational burden during training. Note that since the mesh is dynamic, so are the clusters: the 𝑘 𝑒 cluster at time 𝑡 will not necessarily contain the same points at time 𝑡 + 1.

B.2 ARCHITECTURE AND TRAINING DETAILS

We kept the same training setup for all datasets and trained our model for 10,000 steps with the Adam optimizer and a learning rate of 10 -4 to minimize equation 6 with 𝛼 = 10 -1 and 𝐻 = 8. Velocity and pressure are normalized with statistics computed on the train set, except for Scalar-Flow, where better results are obtained without normalization. Encoder -𝜙 node and 𝜙 edge are one-layer MLPs with ReLU activations, hidden size and output size of 128 ((𝜂 𝑖 , 𝑒 𝑖𝑗 ) ∈ ℝ 128 ). We used 𝐿=4 chained graph neural network layers composed of two identical MLP 𝜓 edge and 𝜓 node with two hidden layer of dimension 128, ReLU activated, followed by layer normalization. The positional encoding function 𝐹 is defined as follows: 𝐹 (𝑥) = [cos(2 𝑖 𝜋𝑥) sin(2 𝑖 𝜋𝑥)] 𝑖=-3,...3 (8) where 𝑥 is a 2D vector modeling the position of node 𝑖. Graph Pooling -we used a single layer gated recurrent unit with hidden size of dimension 𝑊 followed by a single layer MLP with hidden and output size of 𝑊 . This step produces a cluster feature representation 𝑤 𝑘 ∈ ℝ 𝑊 . For Cylinder-Flow and EAGLE, 𝑊 =512. For Scalar-Flow, 𝑊 =128. Attentional module -Following (Xiong et al., 2020) an attention block is defined as follows for an input 𝑤 ∈ ℝ 𝑊 : 𝑤 1 = LN(𝑤)|𝐹 ( x𝑘 ) 𝑤 2 = MHA(𝑤 1 , 𝑤 1 , 𝑤 1 ) 𝑤 3 = 𝑤 + Linear(𝑤 2 ) 𝑤 4 = LN(𝑤 3 ) 𝑤 5 = MLP(𝑤 4 ) 𝑤 6 = 𝑤 3 + 𝑤 5 where LN are layer norm functions, Linear is linear function (with bias), MHA is multi-head attention and MLP is a multi-layer perceptron with one hidden layer of size 𝑊 . We denote the barycenter of cluster 𝑘 as x𝑘 . We used 𝑀=4 chained attention block, with four attention head each. The last attention layer is followed by a final layer norm. Decoder -The decoder takes as input the node embeddings 𝜂 𝑖 , the cluster features updated by the attentional module 𝑤 𝑀 𝑘 and the node-wise positional encoding 𝑓 𝑖 . We applied a graph neural network composed of two identical MLP (two hidden layers with hidden size of 128, ReLU activated and layer norm). The resulting node embeddings are fed to a final MLP with two hidden layers and hidden size of 128, with TanH as activation function.

B.3 BASELINES TRAINING DETAILS

After performing a grid search to select the best options, we found that training each baseline to minimize equation 6 with Adam optimizer and learning rate of 10 -4 produces best results. We vary the weighting factor 𝛼 to maintain balance between pressure and velocity. For Cylinder-Flow and EAGLE, we trained the baselines over 𝐻 = 5 time-steps. For Scalar-Flow, we set 𝐻 = 20. MeshGraphNet -we performed grid search over the number of GNN layers to fit to each dataset, but best results were obtained with the recommended depth 𝐿 = 15 for each dataset. Conversely to what is suggested in Pfaff et al. (2021) , we found that training MeshGraphNet over a longer horizon improves the general performances. We used our own implementation of the baseline and make sure to reproduce the results presented in the main paper (for Cylinder-Flow only). We get the best trade-off between velocity and pressure with 𝛼 = 10. GAT -we performed a grid-search over the number of heads per layer and the number of layers. Best results were obtained for 10 layers of graph attention transformer and two attention heads per layer (except for Cylinder-Flow, where four heads slightly improves the performances). DilResNet -we found that increasing the number of blocks improves overall performance, setting the number of convolutional blocks from 4 to 20. The baselines are structurally built to predict pressure field ′ (𝑡+ℎ) and velocity field ′ (𝑡+ℎ) described on the mesh geometry at current time  (𝑡). Auto-regressive forecasting on a longer horizon thus requires interpolation of the predicted flow to the (provided) future mesh  (𝑡 + ℎ). We do not want interpolation to disturb our problem of interest, which is turbulent flow prediction. Therefore, we made the interpolation from  (𝑡) to time  (𝑡 + 1) straightforward. As the vast majority of the mesh remains static (see previous section), only the nodes linked to the UAV need to be interpolated. Since they can readily be associated in a one-to-one relation, nearest point interpolation can be performed automatically by assigning ′ (𝑡+ℎ) at these points to (𝑡+ℎ).

C MORE RESULTS

C.1 DETAILED METRICS Formally, we used the following metrics to report our results on the test set : where ṽ and p are standard deviation of velocity and pressure field computed on the train set. Detailed metric -raw root mean squared error (RMSE) on each field is reported in figure 11 as well as temporal evolution of N-RMSE across prediction horizon. On Cylinder-Flow (Figure 11a ), velocity error is very similar between MeshGraphNet and ours. Our model slightly outperforms the baseline on the pressure field, yielding overall better performances. However, temporal evolution of the N-RMSE indicated that both models converge to the same accuracy for very long roll-out prediction. On EAGLE, our model shows excellent stability over long horizon, and produces accurate velocity and pressure estimates. K-number -is a property which can be calculated for attention maps, and which consists in the number of tokens required to reach 90% of attention (Kervadec et al., 2021) . This property can be used to characterize the shape of attention maps, varying from peaky attention (requiring few tokens to reach 90%) to more uniform attention heads. We show k-numbers in Figure 10 . Interestingly, the k-number maps can be compared with attention maps figure 5d: peaky heads (in blue) are correlated with relatively local attention maps, and conversely, more uniform heads (in red) correspond to attention maps focusing on larger distances, often following the airflow. Some heads have different behavior depending on the selected cluster, and are peaky in some areas (mainly around the boundaries of scene), but more uniform elsewhere. These cues support the importance of global attention in our model. 



Acknowledgements -we recognize support through French grants "Delicio" (ANR-19-CE23-0006) of call CE23 "Intelligence Artificielle" and "Remember" (ANR-20-CHIA0018), of call "Chaires IA hors centres". https://elki-project.github.io/tutorial/same-size_k_means



Figure 2: Velocity field norm over time for three episodes, one for each geometry type. Turbulence is significantly different from one simulation to another and strongly depends on the ground surface.

9 8 2 7 e o F G 9 J e e 6 Z G z e k U n 9 6 E l D Z W S R N R X k J Y n 2 a b e G a c N f u b d 2 4 8 9 d 2 u 6 e 8 V X g G x C h 1 i / 9 L 1 M / + r 0 7 U o t L B j a p B U U 2 w Y X R 0 v X D L T F X 1 z + 0 t V i h x i 4 j R u U j w h z I 2 y 3 2 f b a F J T u + 4 t M / E 3 k 6 l Z v e d F b o Z 3 f U s a s P t z n I P g d K P i b l Y 2 j r f K u 3 v F q C e w j B W s 0 T y 3 s Y s D H K F K 3 j d 4 x B O e L c + 6 t e 6 s + 8 9 U a 6 j Q L O H b s h 4 + A F / 0 m m Q = < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " O k f w d k m D h E 1 z M N e 4 y L G H p / c b E 4 8

H 8 A b f 6 a e I f 6 F 9 4 Z 0 x B L a I T k p w 5 9 5 4 z c + 9 1 Y 5 8 n 0 r J e c 8 b C 4 t L y S n 6 1 s L a + s b l V 3 N 5 p J l E q P N b w I j 8 S b d d J m M 9 D 1 p B c + q w

Figure 5: Locality of reasoning. (a) velocity of an example flow and a selected point ; (b): The receptive field for this point for the MeshGraphNet model (Pfaff et al., 2021) is restricted to a local neighborhood, also illustrated through the overlaid gradients ||∇ (𝑡) (𝑡+1|| 1 . (c): the receptive field of our method covers the whole field and the gradients indicate that this liberty is exploited; (d) the attention distributions for point , certain maps correlate with airflow. Attention maps can be explored interactively using the online tool at https://eagle-dataset.github.io.

Figure 6: Ablations: GNN replaces global attention by a set of 𝐿 GNNs on the coarser mesh. Onering constrains attention to the onering. Average forces uniform attention.

4 3 S A S 9 a B z Y O L 1 S y F V m M T H u p e K 0 6 7 f j s P z k P u a q L P i q u e y E y 2 u d J / F S U s o l g r J e JQ p L e T 1 W b H k l l 2 7 2 D D w c l B C v i p J 8 Q U n a C E B R 4 Y u B G J o w h F 8 K H q a 8 O A i J e 4 U f e I k o d D G B a 4 x T d q M s g R l + M R 2 6 N u m X T N n Y 9 o b T 2 X V n E 6 J 6 J W k Z F gn T U J 5 k r A 5 j d l 4 Z p 0 N + 5 t 3 3 3 q a u / X o H + R e X W I 1 L o j 9 S z f I / K / O 1 K J x j l 1 b Q 0 g 1 p Z Y x 1 f H c J b N d M T d n X 6 r S 5 J A S Z 3 C L 4 p I w t 8 p B n 5 n V K F u 7 6 a 1 v 4 2 8 2 0 7 B m z / P c D O / m l j R g 7 + c 4 h 0 F t s + x t l T e P t k t 7 + / m o C 1 j B G j Z o n j v Y w y E q q J L 3 D R 7 x h G e H O 7 f O n X P / m e q M 5 J p l f F v O w w f / S 5 q k < / l a t e x i t > 20 nodes per cluster < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 /

r / O w z s x p l a z e 9 9 W z 8 1 W Y a 1 u x 5 l p v i z d y S B l z + O c 5 R 0 K y U y t V S 5 W C r W N v N R p 3 D K t a x S f P c R g 3 7 q K N B 3 t d 4 w C O e H O 7 c O L f O 3 U e q M 5 Z p V v B t O f f v B F C a p g = = < / l a t e x i t > 40 nodes per cluster < l a t e x i t s h a 1 _ b a s e 6 4 = " O 6 4 t t R n G h Z e 2 y O N Z c 8 / s i m R 7 c e s = " > A

Figure 7: Impact of cluster size. Left: We color code RMSE in logarithmic scale on the velocity field near a relatively turbulent area at a horizon of ℎ=400 steps of forecasting. Right: Error (+50), inference time and FLOPs for different cluster sizes and the baselines. Model Ours MGN Ablated Geometry Stp Spl Tri ∅ Stp Spl Tri ∅ Stp 0.927 0.865 1.132 0.828 2.062 1.236 1.347 1.116 Spl 0.595 0.584 0.857 0.488 1.257 0.941 1.037 0.807 N-RMSE (+250) Tri 0.730 0.732 1.049 0.647 1.685 1.100 1.131 1.037

Figure8: (a) sample of raw simulation measurements obtained on a high resolution mesh. This single snapshot contains 158,961 nodes. (b) example of node density map controlling the sampling disk radius. The raw mesh is more dense near the boundaries and on the left side, as this sample is taken from a simulation where the drone explores the left region of the scene. (c) Mesh simulation at final resolution. We drastically simplified the mesh while maintaining a satisfactory level of details.

Figure 9: Illustration of the pixellisation process. The left column (a and b) shows snapshots of simulations from the grid-based datasets, used to train DilResNet. For comparison, we show the same snapshots in the mesh-based datasets (c and d). While resolution seems better on grid-based simulation, it lacks precision near sensible region, which are primordial for accurate forecasts.

Figure 10: The maps show the k-number computed for each cluster, that is, the number of nodes required to reach 90% of the attention. A low k-number indicates a very specialized head (attending to few nodes), while a high k-number indicates uniform attention.

(a) CylinderFlow: (Right) RMSE on velocity V and pressure P fields. (Left) Normalized RMSE over forecasting horizon. Our mesh transformer overcomes the baselines by a small margin. Yet qualitative results tends to indicates that Cylinder Flow is already a well mastered task. Scalar Flow: (Right) RMSE on velocity V and density D fields. (Left) Normalized RMSE over forecasting horizon. Our model shows improvements over the baselines on both fields.

Figure 11: Detailed metrics on Cylinder-Flow, Scalar-Flow and EAGLE, evaluated for each baselines and our model.

Figure 14: Examples of prediction forward in time on EAGLE

Fluid mechanics datasets in the literature. To the best of our knowledge, EAGLE is the first dataset of such scale, complexity and variety. Smaller-scale datasets such asLi et al. (2008);Wu et al. (2017) have been excluded, as they favor simulation accuracy over size. The datasets inStachenfeld et al. (2021) are not public, but can be reproduced from the information in the paper.very suitable for applications with reasonable number of particles but larger-scale simulations (vehicle aerodynamic profile, sea flows, etc.) remain out of scope. In fluid mechanics, Eulerian representations are more regularly used, where the flow of quantities are studied on fixed spatial cells.



EAGLE: (Right) RMSE on velocity V and pressure P fields. (Left) Normalized RMSE over forecasting horizon. Our largely and consistently and reliably outperforms the competing baselines. While MeshGraphNet and DilResNet shows comparable performances during first time-steps, our model succeed to control error accumulation for reasonable horizons and eventually presented better simulations.

acknowledgement

C.2 QUALITATIVE RESULTS Figure 12: Examples of prediction forward in time on Cylinder-Flow

Appendix A DATASET DETAILS

A.1 STRUCTURE AND POST-PROCESSINGThe EAGLE dataset is composed of exactly 1,184 simulations of 990 time-steps (33 seconds at 30 fps). Scene geometries are arranged in three categories based on the order of the interpolation used to generate the ground structure: 197 Step scenes, 199 Triangular and 196 Spline. A geometry gives two simulations depending on whether the drone is crossing the left or the right part of the scene. A proper train/valid/test splitting is provided ensuring that each geometry type is equally represented. The train split contains 948 simulations, while test and valid splits each contain 118 simulations.Simulation details -The scene is described as a 5 m×2.5 m 2D surface. Wall boundary conditions (zero velocity) are applied to the frontiers, except for the top edge, which is an outlet (zero diffusion of flow variables). The propellers is modeled as two squares starting in the middle of the scene, with wall boundary conditions on the left, right and top edges, and inlet condition for the bottom edge (normal velocity of intensity proportional to the rotation speed of the propeller). We mesh the scene with triangular cells of an average size of 15 mm, and add inflation near wall boundaries. We let the simulator updates the mesh during time with default parameters.Drone trajectory control -has received special care, and is obtained using model predictive control (MPC) of a dynamical model of a 2D drone allowing realistic trajectory tracking. The model is obtained by constraining the dynamics of a 3D drone model (Romero et al., 2022) to motion in a 2D plane and reducing the number of rotors to two. The drone can therefore move along the axis 𝑥 and 𝑦, and pivot around the 𝑧-axis perpendicular to the simulation plane as follows:where 𝑥, 𝑦 is the 2D position of the drone and 𝜃 its orientation, Ω 1 and Ω 2 the left/right propeller rotation speed, 𝑔 = 9.81m/s is acceleration (gravity), and 𝐾 1 = 10 -4 , 𝐾 2 = 5×10 -5 , 𝐾 3 = 5.5×10 -3 are physical constants depending on drone geometry. The resulting trajectories represent physically plausible outcomes, taking into account inertia and gravity.Mesh down-sampling -consists in simplifying the raw simulation data, as they are not suitable for direct deep learning applications, and require post-processing (see Figure 8a ). The simulation software leverages a very fine-grained mesh dynamically updated in order to accurately solve the Navier-Stokes equations. The main step thus consists in simplifying the mesh to a reasonable number of nodes. Formally, our goal is to construct a new coarser mesh ( (𝑡) ′ , (𝑡) ′ ) based upon the raw mesh proposed by the simulation software ( (𝑡), (𝑡)). To cope with the dynamic nature of the simulation mesh, our approach consists in dividing the target node set into a static and a dynamic part  (𝑡) ′ =  + (𝑡).• The static mesh is obtained by subsampling the simulation point cloud using Poisson Disk Sampling ((Cook, 1986) ). However, the spatial density of  (𝑡) evolves over time (certain areas of space are more densely populated at the end of the simulation than at the start). To preserve finer resolution near relevant regions, we thus concatenated 5 regularly spaced point clouds  (𝑡 𝑘 ) into a single set. We then sub-sample the resulting set by randomly selecting a point, and deleting all neighbors in a sphere of radius 𝑅 around the chosen point. This operation is repeated until no more point is at a distance less than 𝑅 from another. We used an adaptive radius 𝑅 correlated to the density map: when the original point cloud is dense, the radius is smaller. Conversely, the radius increases in sparse areas. An example of the density map is provided in Figure 8b .• The dynamic mesh is mandatory to track drone motion accurately. We therefore complete the static mesh with a dynamical part that follows the boundaries of the UAV. To do so, we used Despite the excellent performance of our model against competitive baselines, there is still room for improvement. Some more difficult configurations give rise to very turbulent flows, widely extended in the scene. The evolution of these flows is more difficult to predict and the models we evaluated failed to remains accurate. In these cases, the precision with which the small vortices are simulated is essential, because some of them will grow to become the majority.Moreover, our model suffers from an error accumulation problem, like any auto-regressive model. Experimentally, we observe that the airflow tends to be smoothed by deep learning models when the prediction horizon increases.Figure 15 : We expose failure cases of our mesh transformer on Eagle. The error increases when the flow tends to intensify throughout the scene, and when turbulence dominates. Over a longer prediction horizon, the airflow tends to be smoother and less turbulent. 

C.4 GENERALIZATION TO DIFFERENT MESH RESOLUTION

In EAGLE, the number of points varies from one simulation to another, forcing the model to generalize on meshes of different sizes. We explicitly demonstrate the performance of our mesh transformer on this task in table 4 . Four instances of the model are trained on a particular regime in which the simulation meshes are randomly down-sampled, respectively at 90%, 80%, 70% and 60% of the initial mesh resolution. These models are then evaluated in a different regime from the one used for training, either higher (more points on average during test than during training), or lower (fewer points in test than in training). We show that our model generalizes well to these different regimes by giving relatively close N-RMSE measurements for a given down-sampling regime.

D EXTENSION TO 3D FLUID SIMULATION

While we think that 3D simulations are indeed the long-term future on this subject, we argue that the complexity in factors of variation we need for large-scale machine learning is currently not possible in 3D simulations, and this has motivated our choice for a challenging 2D dataset. In this section, we discuss the possible extension of our dataset and the mesh transformer method to problems in three dimensions. We address two aspects: data generation itself, and the extension of the method.

D.1 DATA GENERATION

Fluids datasets in 3D are very limited, due to the computation time required for simulation. Meshbased simulation in three dimensions greatly increases the number of points in the mesh, and thus exponentially increases the computing time (see numerical evidences in Kim (2019) ; Dantan et al. (2017) ). Classical workarounds rely on relaxing physical accuracy or versatility of the solver, e.g. with SPH simulations. Accurate 3D simulations are mostly conducted on grid-based meshes, and for rather simple, theoretic problems (Mohan et al., 2020a; Chen et al., 2021b; Stachenfeld et al., 2021) . The John Hopkins Turbulent Database (Li et al., 2008) contains nine direct numerical simulation datasets (i.e. direct resolution of Navier-Stokes equations) but with only a single scene per dataset simulated on a very fine grid and low time resolution.Therefore, extending EAGLE to 3D simulations is very difficult without sacrificing one of the fundamental principles on which our dataset relies: (i) accuracy, guaranteed by the resolution of RANS equations with demanding turbulence model on a very fine mesh (ii) irregular meshes, which are much more versatile and widespread in engineering and (iii) large scale, with nearly 1200 different scene configurations.

D.2 MODELS AND METHODS

Forecasting models in the literature mostly focus on 2D simulations (Li et al., 2019; Kashefi et al., 2021; Han et al., 2022; Thuerey et al., 2020) . To the best of our knowledge, there is no published work on large-scale machine learning models performing flow prediction on irregular 3D meshes. Therefore, publishing a 3D version of EAGLE seems premature, not to mention the difficulty of distributing such a dataset and training models on reasonable setups. For grid-based simulation on the other hand, few works leverage CNN-like structures for flow prediction (Stachenfeld et al., 2021;  

