LEARNING EFFICIENT HYBRID PARTICLE-CONTINUUM REPRESENTATIONS OF NON-EQUILIBRIUM N-BODY SYSTEMS

Abstract

An important class of multi-scale, non-equilibrium, N-body physical systems deals with an interplay between particle and continuum phenomena. These include hypersonic flow and plasma dynamics, materials science, and astrophysics. Hybrid solvers that combine particle and continuum representations could provide an efficient framework to model these systems. However, the coupling between these two representations has been a key challenge, which is often limited to inaccurate or incomplete prescriptions. In this work, we introduce a method for Learning Hybrid Particle-Continuum (LHPC) models from the data of first-principles particle simulations. LHPC analyzes the local velocity-space particle distribution function and separates it into near-equilibrium (thermal) and far-from-equilibrium (non-thermal) components. The most computationally-intensive particle solver is used to advance the non-thermal particles, whereas a neural network solver is used to efficiently advance the thermal component using a continuum representation. Most importantly, an additional neural network learns the particle-continuum coupling: the dynamical exchange of mass, momentum, and energy between the particle and continuum representations. Training of the different neural network components is done in an integrated manner to ensure global consistency and stability of the LHPC model. We demonstrate our method in an intense laser-plasma interaction problem involving highly nonlinear, far-from-equilibrium dynamics associated with the coupling between electromagnetic fields and multiple particle species. More efficient modeling of these interactions is critical for the design and optimization of compact accelerators for material science and medical applications. Our method achieves an important balance between accuracy and speed: LHPC is 8× faster than a classical particle solver and achieves up to 6.8-fold reduction of long-term prediction error for key quantities of interest compared to deep-learning baselines using uniform representations.

1. INTRODUCTION

The dynamics of physical systems is often nonlinear and involves the competition of different processes across a wide range of spatial and temporal scales. This gives rise to local non-equilibrium conditions (in the thermodynamic sense) that results in the failure of common numerical approaches. While continuum models (e.g., based on fluid equations) can provide accurate and computationally efficient descriptions of near-equilibrium systems at large scales, they break down when significant departures from local equilibrium are encountered (often at small scales) and give rise to important N-body phenomena. Kinetic (e.g., particle-based) numerical methods can accurately describe these non-equilibrium phenomena but are very computationally intensive, limiting their practical application to small scales. Over the last decades, this has motivated efforts to develop hybrid algorithms that can more efficiently couple continuum and particle representations in a variety of fields, including hypersonic gas dynamics (Schwartzentruber & Boyd, 2006) , high-energy-density physics (Fiuza et al., 2011) , and plasma physics (Bai et al., 2015) . Plasmas -hot ionized gases of charged particles -are a particularly challenging class of complex physics systems, where long-range electromagnetic interactions inevitably drive multi-scale and far-from-equilibrium dynamics. Indeed, plasma research associated with controlled nuclear fu- x c 6 v D B l R R h q a r k i n B W 4 y 8 T J q n V e + s e n 5 7 V q n d 5 H U U y S E 5 I i f E I x e k R q 5 J n T Q I I x l 5 J q / k z X q y X q x 3 6 2 M + W r D y n Q P y B 9 b n D 4 H 1 k 6 0 = < / l a t e x i t > x c 6 v D B l R R h q a r k i n B W 4 y 8 T J q n V e + s e n 5 7 V q n d 5 H U U y S E 5 I i f E I x e k R q 5 J n T Q I I x l 5 J q / k z X q y X q x 3 6 2 M + W r D y n Q P y B 9 b n D 4 H 1 k 6 0 = < / l a t e x i t > g fluid < l a t e x i t s h a 1 _ b a s e 6 4 = " s 1 7 + u s C f I G l / v d 8 n X q 5 d U r L 3 9 z o = " > A A A B 9 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S R S 0 W P B i 3 i q Y D + g j W V G E o g m q Z E J w F 1 9 e J q 2 z q l u r n t / W K v W b P I 4 i H M E x n I I L F 1 C H a 2 h A E y g o e I Z X e L M e r R f r 3 f q Y t x a s f O Y Q / s D 6 / A F A Q 5 M J < / l a t e x i t > g fluid < l a t e x i t s h a 1 _ b a s e 6 4 = " s 1 7 + u s C f I G l / v d 8 n X q 5 d U r L 3 9 z o = " > A A A B 9 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S R S 0 W P B i 3 i q Y D + g j W W z 2 b R L N 5 u w O 1 F L 6 P / w 4 k E R r / 4 X b / 4 b t 2 0 O 2 v p g 4 P H e D D P z / E R w j Y 7 z b R V W V t f W N 4 q b p a 3 t n d 2 9 8 v 5 B S 8 e p o q x J Y x G r j k 8 0 E 1 y y J n I U r J M o R i J f s L Y / u p r 6 7 Q e m N I / l H Y 4 T 5 k V k I H n I K U E j 3 Q / 6 P W R P m I U i 5 c G k X 6 4 4 V W c G e 5 m 4 O a l A j k a / / N U L Y p p G T C I V R O u u 6 y T o Z U Q h p 4 J N S r 1 U s 4 T Q E R m w r q G S R E x 7 2 e z q i X 1 i l M A O Y 2 V K o j 1 T f 0 9 k J N J 6 H P m m M y I 4 1 I v e V P z P 6 6 Y Y X n o Z l 0 m K T N L 5 o j A V N s b 2 N A I 7 4 I p R F G N D C F X c 3 G r T I V G E o g m q Z E J w F 1 9 e J q 2 z q l u r n t / W K v W b P I 4 i H M E x n I I L F 1 C H a 2 h A E y g o e I Z X e L M e r R f r 3 f q Y t x a s f O Y Q / s D 6 / A F A Q 5 M J < / l a t e x i t > ginject < l a t e x i t s h a 1 _ b a s e 6 4 = " b g H I 6 u D Y E 3 O 8 v H J U i r z Y 1 s n K U V 4 = " > A A A B + H i c b V D L S s N A F J 3 U V 6 2 P R l 2 6 C R b B V U l E 0 W X B j b i q Y B / Q h j C Z 3 r R j J 5 M w c y P W 0 i 9 x 4 0 I R t 3 6 K O / / G a Z u F t h 4 Y O J x z L 3 P P C V P B N b r u t 1 V Y W V 1 b 3 y h u l r a 2 d 3 b L 9 t 5 + U y e Z Y t B g i U h U O 6 Q a B J f Q Q I 4 C 2 q k C G o c C W u H w a u q 3 H k B p n s g 7 H K X g x 7 Q v e c Q Z R S M F d r k f d B E e c c z l P T C c B H b F r b o z O M v E y 0 m F 5 K g H 9 l e 3 l 7 A s B o l M U K 0 7 n p u i P 6 Y K O R M w K X U z D S l l Q 9 q H j q G S x q D 9 8 e z w i X N s l J 4 T J c o 8 i c 5 M / b 0 x p r H W o z g 0 k z H F g V 7 0 p u J / X i f D 6 N I 3 m d I M Q b L 5 R 1 E m H E y c a Q t O j y u T V o w M o U x x c 6 v D B l R R h q a r k i n B W 4 y 8 T J q n V e + s e n 5 7 V q n d 5 H U U y S E 5 I i f E I x e k R q 5 J n T Q I I x l 5 J q / k z X q y X q x 3 6 2 M + W r D y n Q P y B 9 b n D 4 H 1 k 6 0 = < / l a t e x i t > g fluid < l a t e x i t s h a 1 _ b a s e 6 4 = " s 1 7 + u s C f I G l / v d 8 n X q 5 d U r L 3 9 z o = " > A A A B 9 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S R S 0 W P sion (Zylstra et al., 2022) , advanced laser-plasma particle accelerators (Haberberger et al., 2012) , and most space and high-energy astrophysical environments (Drake et al., 2006) have long stimulated the development of hybrid particle-continuum representations to accurately and efficiently capture the nonlinear dynamics of these systems (Winske, 1985; Fiuza et al., 2011; Bai et al., 2015; Liu et al., 2022) . In all these cases, small-scale kinetic processes can accelerate a very small group of particles to energies significantly above the mean (thermal) energy, driving the system out of equilibrium. Importantly, this small group (few %) of non-thermal particles can carry away a large fraction (up to 50%) of the system energy and thus impact its global evolution. This motivated the recent development of hybrid representations that use a fluid solver to model the near-equilibrium (thermal) part of the particle distribution and a particle-based kinetic solver to model the evolution of high-energy particles (Kowal et al., 2011; Bai et al., 2015; Guidoni et al., 2016) . However, the coupling of the two representations has been based on over-simplified phenomenological prescriptions that can limit their validity and applicability. B i 3 i q Y D + g j W W z 2 b R L N 5 u w O 1 F L 6 P / w 4 k E R r / 4 X b / 4 b t 2 0 O 2 v p g 4 P H e D D P z / E R w j Y 7 z b R V W V t f W N 4 q b p a 3 t n d 2 9 8 v 5 B S 8 e p o q x J Y x G r j k 8 0 E 1 y y J n I U r J M o R i J f s L Y / u p r 6 7 Q e m N I / l H Y 4 T 5 k V k I H n I K U E j 3 Q / 6 P W R P m I U i 5 c G k X 6 4 4 V W c G e 5 m 4 O a l A j k a / / N U L Y p p G T C I V R O u u 6 y T o Z U Q h p 4 J N S r 1 U s 4 T Q E R m w r q G S R E x 7 2 e z q i X 1 i l M A O Y 2 V K o j 1 T f 0 9 k J N J 6 H P m m M y I 4 1 I v e V P z P 6 6 Y Y X n o Z l 0 m K T N L 5 o j A V N s b 2 N A I 7 4 I p R F G N D C F X c 3 G r T I V G E o g m q Z E J w F 1 9 e J q 2 z q l u r n t / W K v W b P I 4 i H M E x n I I L F 1 C H a 2 h A E y g o e I Z X e L M e r R f r 3 f q Y t x a s f O Y Q / s D 6 / A F A Q 5 M J < / l a t e x i t > u < l a t e x i t s h a 1 _ b a s e 6 4 = " o 4 u D T 5 G 0 z T H J L Z r w I T D N 6 T p I X j U = " > A A A B 8 X i c b V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s x I R Z c F N + K q g n 1 g O 5 R M e q c N z W S G J C O U o X / h x o U i b v 0 b d / 6 N a T s L b T 0 Q O J x z L z n 3 B I n g 2 r j u t 1 N Y W 9 / Y 3 C p u l 3 Z 2 9 / Y P y o d H L R 2 n i m G T x S J W n Y B q F F x i 0 3 A j s J M o p F E g s B 2 M b 2 Z + + w m V 5 r F 8 M J M E / Y g O J Q 8 5 o 8 Z K j 7 2 I m l E Q Z u m 0 X 6 6 4 V X c O s k q 8 n F Q g R 6 N f / u o N Y p Z G K A 0 T V O u u 5 y b G z 6 g y n A m c l n q p x o S y M R 1 i 1 1 J J I 9 R + N k 8 8 J W d W G Z A w V v Z J Q + b q 7 4 2 M R l p P o s B O z h L q Z W 8 m / u d 1 U x N e + x m X S W p Q s s V H Y S q I i c n s f D L g C p k R E 0 s o U 9 x m J W x E F W X G l l S y J X j L J 6 + S 1 k X V q 1 U v 7 2 u V + l 1 e R x F O 4 B T O w Y M r q M M t N K A J D C Q 8 w y u 8 O d p 5 c d 6 d j 8 V o w c l 3 j u E P n M 8 f / a q R K Q = = < / l a t e x i t > u < l a t e x i t s h a 1 _ b a s e 6 4 = " o 4 u D T 5 G 0 z T H J L Z r w I T D N 6 T p I X j U = " > A A A B 8 X i c b V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s x I R Z c F N + K q g n 1 g O 5 R M e q c N z W S G J C O U o X / h x o U i b v 0 b d / 6 N a T s L b T 0 Q O J x z L z n 3 B I n g 2 r j u t 1 N Y W 9 / Y 3 C p u l 3 Z 2 9 / Y P y o d H L R 2 n i m G T x S J W n Y B q F F x i 0 3 A j s J M o p F E g s B 2 M b 2 Z + + w m V 5 r F 8 M J M E / Y g O J Q 8 5 o 8 Z K j 7 2 I m l E Q Z u m 0 X 6 6 4 V X c O s k q 8 n F Q g R 6 N f / u o N Y p Z G K A 0 T V O u u 5 y b G z 6 g y n A m c l n q p x o S y M R 1 i 1 1 J J I 9 R + N k 8 8 J W d W G Z A w V v Z J Q + b q 7 4 2 M R l p P o s B O z h L q Z W 8 m / u d 1 U x N e + x m X S W p Q s s V H Y S q I i c n s f D L g C p k R E 0 s o U 9 x m J W x E F W X G l l S y J X j L J 6 + S 1 k X V q 1 U v 7 2 u V + l 1 e R x F O 4 B T O w Y M r q M M t N K A J D C Q 8 w X j U = " > A A A B 8 X i c b V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s x I R Z c F N + K q g n 1 g O 5 R M e q c N z W S G J C O U o X / h x o U i b v 0 b d / 6 N a T s L b T 0 Q O J x z L z n 3 B I n g 2 r j u t 1 N Y W 9 / Y 3 C p u l 3 Z 2 9 / Y P y o d H L R 2 n i m G T x S J W n Y B q F F x i 0 3 A j s J M o p F E g s B 2 M b 2 Z + + w m V 5 r F 8 M J M E / Y g O J Q 8 5 o 8 Z K j 7 2 I m l E Q Z u m 0 X 6 6 4 V X c O s k q 8 n F Q g R 6 N f / u o N Y p Z G K A 0 T V O u u 5 y b G z 6 g y n A m c l n q p x o S y M R 1 i 1 1 J J I 9 R + N k 8 8 J W d W G Z A w V v Z J Q + b q 7 4 2 M R l p P o s B O z h L q Z W 8 m / u d 1 U x N e + x m X S W p Q s s V H Y S q I i c n s f D L g C p k R E 0 s o U 9 x m J W x E F W X G l l S y J X j L J 6 + S 1 k X V q 1 U v 7 2 u V + l 1 e R x F O 4 B T O w Y M r q M M t N K A J D C Q 8 w X j U = " > A A A B 8 X i c b V D L S g M x F L 1 T X 7 W + q i 7 d B I v g q s x I R Z c F N + K q g n 1 g O 5 R M e q c N z W S G J C O U o X / h x o U i b v 0 b d / 6 N a T s L b T 0 Q O J x z L z n 3 B I n g 2 r j u t 1 N Y W 9 / Y 3 C p u l 3 Z 2 9 / Y P y o d H L R 2 n i m G T x S J W n Y B q F F x i 0 3 A j s J M o p F E g s B 2 M b 2 Z + + w m V 5 r F 8 M J M E / Y g O J Q 8 5 o 8 Z K j 7 2 I m l E Q Z u m 0 X 6 6 4 V X c O s k q 8 n F Q g R 6 N f / u o N Y p Z G K A 0 T V O u u 5 y b G z 6 g y n A m c l n q p x o S y M R 1 i 1 1 J J I 9 R + N k 8 8 J W d W G Z A w V v Z J Q + b q 7 4 2 M R l p P o s B O z h L q Z W 8 m / u d 1 U x N e + x m X S W p Q s s V H Y S q I i c n s f D L g C p k R E 0 s o U 9 x m J W x E F W X G l l S y J X j L J 6 + S 1 k X V q 1 U v 7 2 u V + l 1 e R x F O 4 B T O w Y M r q M M t N K A J D C Q 8 w In this work, we introduce a method for Learning Hybrid Particle-Continuum (LHPC) models to address the key challenge of efficiently and accurately coupling fluid and kinetic representations of far-from-equilibrium N-body systems. LHPC combines a classical particle-in-cell (PIC) solver to advance the non-thermal particle distribution with a neural network surrogate model that efficiently advances the thermal component using a continuum representation. Most importantly, our key contribution is the use of an additional neural network to learn the self-consistent particle-continuum coupling: the dynamical exchange of mass, momentum, and energy between the particle and continuum representations. This coupling is learned from the data of first-principles PIC simulations, providing an accurate and physics-informed description that addresses the main limitation of previous hybrid methods. While the combination of classical numerical solvers and deep learning has been explored before (Um et al., 2020; Vlachas et al., 2022) , these have been primarily based on uniform representations. The use of machine learning techniques to learn efficient and accurate coupling between continuum and particle representations introduced in this work is a promising and important route for addressing the multi-scale and multi-physics challenge of modeling N-body non-equilibrium systems. We demonstrate our method on a challenging non-linear, far-from-equilibrium N-body system: the interaction of an intense laser with a solid-density plasma and resulting particle acceleration. Present-day high-power lasers reach intensities in excess of 10 20 W/cm 2 , which nearly instantaneously vaporize and ionize solid-state matter upon interaction, resulting in high-energy-density plasma. These interactions give rise to nonlinear and kinetic processes that establish strong elec-tric fields in the plasma and accelerate particles to high energies over very compact distances. In the last two decades, there has been great interest in exploiting these laser-plasma based accelerators for a wide range of applications, including material science (Patel et al., 2003) , imaging (Rygg et al., 2008) , and medical therapy (Kroll et al., 2022) . Due to the need to capture detailed kinetic physics associated with these interactions, numerical modeling has relied primarily on fully kinetic PIC simulations, which are too computationally intensive -in fact, one-to-one modeling of experimental systems is computationally prohibitive even on the largest supercomputers. Effective hybrid particle-continuum representations have not been previously established for this problem. This is thus a prime example for testing our model and evaluating the ability of LHPC to learn efficient and accurate hybrid representations that can have a transformative impact in the modeling of these complex laser-plasma interactions. Our results show that LHPC achieves an important balance between accuracy and speed: LHPC is 8 times faster than a classical particle solver, and achieves up to 6.8-fold reduction of long-term prediction error for key quantities of interest when compared to deep-learning baselines using uniform representations. This opens up a novel route to develop more efficient and accurate particlecontinuum descriptions across different domains involving non-equilibrium N-body systems, and represents a key step for data-driven multi-scale algorithms for nonlinear physical systems.

2. RELATED WORKS

Deep learning has recently emerged as a powerful tool to complement Um et al. (2020) or serve as a surrogate for classical solvers (Sanchez-Gonzalez et al., 2020) . They accelerate simulation of physical systems through coarser spatial resolution (Um et al., 2020; Kochkov et al., 2021) or temporal intervals (Li et al., 2021; Vlachas et al., 2022) , explicit forward method (Tang et al., 2020; Wu et al., 2022b) , or via reduced representations (Sanchez-Gonzalez et al., 2020; Wu et al., 2022a) . Two important examples are LED (Vlachas et al., 2022) and solver-in-the-loop (Um et al., 2020) . The former optimizes the temporal evolution by employing a solver to simulate the full system in some interval of time and uses latent evolution to evolve in other parts of time, with a pre-defined alternative schedule. The latter uses a fluid solver on a coarser spatial grid and a neural network to correct for the fluid solver's error. However, almost all works employ a uniform representation, based on either all particles or all fluid. This is not appropriate when trying to optimize far-fromequilibrium, N-body systems where a particle description is too computationally expensive for the near-thermal population (where most particles reside) and a fluid description cannot accurately describe the complex distributions associated with the far-from-equilibrium population. Our method aims to address this long-standing challenge by learning an accurate and efficient coupling between different representations involving different physical processes and scales, namely a hybrid particlecontinuum representation which is critical to describe non-equilibrium N-body systems. To the best of our knowledge, machine learning methods have not yet been used to address this problem.

3. PROBLEM SETUP AND PRELIMINARIES

Task setup. In general, simulations of the time-evolution of a dynamical system can be described as follows. The state of the system at time t is s t , and there exists a ground-truth evolution g * that evolves this state: s t+1 = g * (s t ), t = 0, 1, 2, . . . The ground-truth evolution g * can either be the physical world which is challenging to predict, or a first-principles numerical solver which can be slow and expensive for large-scale systems. Assume that we have a pre-defined mapping h which maps the state s t to a representation S t : S t = h(s t ), t = 0, 1, 2, . . ., such that S t captures the essence of the system. For example, consider an original state s t represented by particles, and S t a fluid representation which describes the system with statistics of the particles within each cell (see below). Alternatively, h may be an identity mapping resulting in S t = s t . A hybrid h mapping will apply both of these representations to different groups of particles. Given the states {s t }, t = 1, 2, . . . , T , the task is to design a proper representation S t = h(s t ), and learn a surrogate model g θ , typically in terms of a neural network, which approximates the ground truth evolution of S t :  Ŝt+1 = g θ (S t ), t = 0, 1, 2, . . . ( ) Here Ŝt+1 is the model g θ 's prediction of S t+1 . Our goal is to have the long-term autoregressive evolution of the learned surrogate model Ŝt+k = g θ • ... • g θ (S t ) (composing k times) to closely match the ground-truth S t+k . Physical problem overview. In this work, we consider the interactions of intense laser pulses with plasmas, where the laser field accelerates the plasma electrons near the front surface, which in turn sets up a charge separation, generating electric (E-) and magnetic (B-) fields that accelerate and deflect the ions (Fig. 2 ; see Appendix A for details). These interactions of the charged particles with E-and B-fields (collectively EM-field) occur on ultra-fast time scales (∼ 10 -15 s), and can produce electrons and ions with energies exceeding 10 6 electronvolt (≡ 1 MeV) -with the electrons moving at close to the speed of light c. These are prime examples of non-equilibrium systems, for which computationally expensive first-principles simulations are necessary to accurately capture the associated non-linear dynamics. Particle and continuum representations. The two dominant representations of the plasma system considered in our work are particle (kinetic) and continuum (fluid) representations. A particle representation {(x t i , u t i , Q i )} describes the position x t i , momentum vector u t i , and other static properties Q i (e.g., mass or charge) of each particle i. The particles interact with each other, and change their position and momentum accordingly. In contrast, a fluid representation M t is defined on a fixed grid or mesh. The values on each vertex represent the moments (statistical averages) of the momenta of the particles residing around the vertex. These statistical quantities summarize the states of a large number of particles, permitting a more compact representation. Their physical meaning is well established. For example, the first three moments correspond to the fluid observables mass density, fluid velocity, and pressure. We note that in general an exact representation of the full par-ticle distribution requires an infinite number of fluid moments (see Appendix A.1). However, if the particle distribution is near local thermodynamic equilibrium it can be described by a compact set of fluid moments. A hybrid particle-continuum model aims to provide an efficient description of a more general distribution of particles by coupling a particle representation for far-from-equilibrium particles with a continuum representation for particles near equilibrium. Thermal and non-thermal population. In this context, we refer to the group of particles in a near equilibrium state as thermal, for which the momentum distribution can be well described by a Gaussian (i.e. requiring only the mean and variance). On the other hand, we refer to the particles that are far from equilibrium as non-thermal. These are the outliers from this Gaussian distribution. In terms of the total particle distribution function, f , this allows us to decompose it into a thermal, f T , and non-thermal, f NT components, with f = f T + f NT . See Fig. 3 for an illustration of this separation in physical problem of intense laser-plasma interactions considered in this work; details of the separation algorithm are presented in Appendix B.

4.1. METHOD OVERVIEW

Here we introduce Learning Hybrid Particle-Continuum Models (LHPC) to address the key challenges in hybrid representations of non-equilibrium dynamical systems. Specifically LHPC learns two key components, for the evolution of the fluid, and coupling between the particle and fluid representations. In Figs. 1, 3 and the following sections, we detail the data generation and preparation procedure, the main architecture, and its application to the study of intense laser-plasma interactions.

4.1.1. DATA GENERATION AND PREPARATION FROM FIRST-PRINCIPLES SIMULATIONS

Although computationally expensive, full PIC simulations provide ground truth data from firstprinciples particle-based evolution of plasma dynamics. We run full PIC simulations of laser-plasma interactions in a one spatial dimension, three velocity dimension (1D3V) setting, where the laser intensity is varied to generate multiple trajectories. A trajectory is composed of a simulation with 2000 time steps of evolution, involving 1.5 million (numerical) particles evolving on a 10k-cell grid, and is 200 GB in size (see Appendix A.3 for details). To train the two learned components of LHPC, we transform the ground truth particle data into the fluid and particle representations at each time step. This is achieved by first applying an algorithm that separates the particles into thermal and non-thermal populations (e.g. Fig. 3 ; for details see Appendix B). The labeled particle data is then transformed, with the non-thermal population keeping the original particle representation, and using the velocity moments (up to 2nd order) to represent the thermal fluid. As the system evolves, a small fraction of the particles from the thermal population will gain a significant amount of energy to get detached from the thermal distribution, becoming non-thermal. They are said to be "injected" into the non-thermal population. The transformed ground truth data therefore provides the information of the evolution of both representations, and the coupling (injection) between them.

4.1.2. ARCHITECTURE

LHPC evolves the system state S t at time t to state S t+1 at time t+1 with three components (Fig. 1): (1) A neural network g fluid,θ with parameters θ which evolves the (thermal) fluid (Fig. 1b ). (2) A ground-truth solver g * which evolves the non-thermal particles. (3) A neural network g inject,ϕ with parameters ϕ which models the interplay of the fluid and particle populations (Fig. 1c ). In particular, it models the injection of new non-thermal particles from the fluid, and updates the existing fluid and particle populations accordingly. At inference, these processes are iterated autoregressively to predict the state of the system into the long-term future.

4.2. LHPC FOR LASER-PLASMA INTERACTIONS

Here, we employ LHPC to model laser-plasma simulations. The space in x is divided into C consecutive cells: c ∈ [c left , c left+1 , . . . , c right ] in the x direction. The state S t consists of: S t := ((M t , M t , M NT t ), (x NT t , u NT t , x PIC t , u PIC t ), (E t , B t )) c : c ∈ [c l , c l+1 , . . . , c r ] where the velocity moments of different populations M and the non-thermal particles (x, u) interact through the fields (E, B). The full pipeline in Fig. 8 in Appendix C further visualizes the dependencies amongst these components and documents what each quantity represents. The following sequence of operations is used to evolve the system from time t to time t + 1: M t+1 ← g M,θ ((E, B) t , M t ) Advance fluid moments according to electromagnetic (EM-) field on the grid. g M,θ is an instantiation of the g fluid,θ introduced in Sec. 4.1.2. (1a) M t+1 NT ← g MNT,ϕ (M t+1 , M t ) Compute moments of new non-thermal particles to be injected g MNT,ϕ is an instantiation of the g inject,ϕ introduced in Sec. 4.1.2. (1b) (x, u) t+1 NT ← N (M t+1 NT ) Inject new non-thermal particles by sampling from distribution with M t+1 NT (1c) M t+1 ← (x NT , u NT , M ) t+1 Update fluid moments to conserve mass, momentum, and energy (1d) (x, u) t+1 PIC ← G((x, u) t PIC , (E, B) t ); (x, u) t+1

NT

Advance existing particles and append newly injected non-thermal particles (1e) J t+1 ← (M , x, u) t+1 Deposition of electric current J from particles and fluid to grid (1f) (E, B) t+1 ← J t+1 Advance EM-field on the grid using Maxwell's equations (1g) The operators g M,θ and g MNT,ϕ are instantiated from g fluid,θ and g inject,ϕ and are deep neural network models learned from data. The operator G and the field solver advances the non-thermal particles with the classical PIC solver. A more detailed description of each component a-g can be found in Appendix C.1, and detailed architecture, learning procedure and objective is in Appendix E.

4.2.1. BASELINES

We compare with state-of-the-art deep learning-based surrogate model of Fourier Neural Operator (FNO; Li et al. 2021) , and a baseline CNN that has the same architecture as our fluid evolution model without hybrid representation. Both architectures are evaluated on two representations: (1) an all fluid representation which models all particles as fluid defined on each cell; (2) a bi-Gaussian fluid representation, that models thermal particles and non-thermal particles separately but each as a Gaussian. Baseline (1) treats the entire system as a fluid and advances the system by learning g as: M t+1 , (E, B) t+1 = g((E, B) t , M t ). The improved "bi-Gaussian" baseline (2) models the thermal population and the non-thermal particles each as a Gaussian, and advances the system by learning g as: M t+1 , M t+1 PIC , (E, B) t+1 = g((E, B) t , M t , M t PIC ). To evaluate the importance of the coupling network g MNT,ϕ , we also compare our hybrid model with an ablation where we remove g MNT,ϕ , while keeping all other parts the same ("LHPC (nocoupling)"). For all models, we care especially about the error on the moments of the non-thermal subpopulation M PIC to understand the ability of the respective methods to correctly capture the most dynamic subpopulation of particles. This is important because in many of the relevant applications of far-from-equilibrium, N-body systems we are interested in the spectrum of accelerated particles.

5. EXPERIMENTS

In this section, we aim to answer two questions: (1) does our LHPC model with hybrid representation achieve better accuracy than learning the evolution with a pure fluid representation? (2) Does LHPC offer speedup, compared to the full-PIC solver that evolves the full system in first principles with particle representation? We evaluate LHPC and baselines in the N-body laser-plasma interactions system described above. We evaluate both single-step prediction error and auto-regressive rollout error over 50 steps, where the state of the system changes significantly. We use relative L2 as the error metric -the ratio between L2 norm of error over L2 norm of ground-truth values.

5.1. DATA AND EXPERIMENT

Both the baseline and main pipeline are trained over 9 trajectories of PIC simulation. 10 trajectories were generated that vary over the normalized laser intensity a 0 from [2, 4, . . . , 20] and trajectory 5 (a 0 = 10) is held-out for testing (see Appendix A for details). In this way, we test the generalization of the models to novel initial conditions and environment setup. Each dataset consists of T = 1500 time steps and uses the separation strategy (details in Appendix B). For the nine datasets, we train with single-step loss on the time-range [0,1400], and validate on [1400, 1500] . We iterate on hyperparameters to obtain the best individual settings for both the baseline and hybrid pipelines respectively (detailed in Appendix). Then, the best models are finetuned using the multi-step loss with data from timeranges and validated on [1400, 1500] . We report the confidence intervals of the 1-, 20-and 50-step relative L2 errors on the held-out dataset and report the results in Table 1 , which shows the error for the EM-field, the non-thermal particles (M PIC ) and the full fluid, respectively. Runtime for ground-truth (GT) solver and the compared models are also provided in Table 1 . We provide additional results in Appendix D with an ablation study on trajectory 10 with the most intense laser (a 0 = 20).

5.2. RESULTS

Error comparison. From Table 1 , we see that our LHPC typically outperforms the baselines in terms of error by a wide margin. Among the three aspects we evaluate, the EM-field error and the non-thermal particles (M PIC ) are the most important, since the EM-field is most sensitive to the local particle dynamics, and the non-thermal particles are typically the most important observable/deliverable of laser-plasma accelerators. An all-fluid description can be very efficient and accurate at describing the thermal (M ) population, as is common for traditional fluid solvers. However, it fails to capture non-thermal particle acceleration, manifested in large errors for the field evolution (Field) and non-thermal population (M PIC ). Specifically, for Field, our LHPC achieves an error reduction of 91.0%, 91.3% and 85.4% (6.8-fold) for 1-, 20-and 50-step autoregressive rollout, compared to the best models among the baselines. Similarly, in the evaluation of M PIC , our LHPC achieves an error reduction of 63.5%, 36.2%, 25.8% for 1-, 20-and 50-step predictions, respectively, demonstrating the ability of our LHPC to model accurately the dynamics of those energetic non-thermal particles. The Baseline CNN, although having the same architecture as our LHPC, cannot model accurately the non-thermal particles via either all-fluid or Bi-Gaussian representations, thus achieving much larger error. Specifically, while the use of a separate fluid description for the non-thermal population (Bi-Gaussian model) can help capture its impact on the system evolution, it still leads to significantly larger errors in the field evolution when compared to the LHPC model and cannot be used to fully describe the complex distribution of the non-thermal particles (see also Sec. 5.3) . Comparing baseline and state-of-the-art FNO, we find that FNO performs comparable or slightly worse than Baseline CNN, likely due to the simpler architecture of CNN being more appropriate to capture the local fluid evolution. The LHPC (no-coupling) ablation shows that the coupling component is essential in ensuring good long-term accuracy, especially for Field and M PIC . On Table 1 : Results for the N-body laser-plasma interactions system on the held-out dataset. We evaluate three aspects of prediction: EM-field (Field), non-thermal particles (M PIC ), and the fluid moments M . Mean and 90% confidence interval are reported. The results are average performance on the held-out dataset over 15 independent rollouts with initial conditions at t = 0, 100, . . . , 1400. Our LHPC achieves significant improvement on the EM-field predictions in both short-and long-terms. In terms of EM-field, LHPC achieves a 91.0%, 91.3%, and 85.4% (6.8-fold) reduction of error for 1-, 20-and 50-step rollouts, respectively, compared to the best-performing baseline. Similarly, for M PIC LHPC achieves error reduction of 63.5%, 36.2%, 25.8% for 1-, 20-and 50-step predictions. Note that the performance is N/A for all-fluid representation models for M PIC as they cannot model the non-thermal particles separately. For M prediction, our LHPC has comparable performance with the baselines, since both use similar architectures to model the thermal part of the distribution as fluid, but the baselines also receive M as an input whereas LHPC does not. predicting the fluid moments (M ), our LHPC model performs comparably to Baseline that has the same CNN architecture, showing both models can model reasonably the dynamics of the fluid. Runtime comparison. While achieving significant error reduction on predicting the EM-field and the non-thermal particles, our LHPC also achieves significant speedup compared to the Groundtruth (GT) full PIC solver, with a reduction of runtime by 8.0-fold. We also see that the runtime of LHPC is not much higher than full deep learning based baselines, showing that although our LHPC involves a GT solver to evolve the non-thermal particles, the increase in runtime is negligible since the non-thermal particles only constitute a small fraction of the system.

5.3. VISUALIZATION OF RESULTS

Figure 4 : Comparison of the evolution of non-thermal electron current J eNT between ground truth (Full-PIC) and LHPC across 50 steps of rollout. In this section, we visualize the predictions of LHPC. An accurate evolution of the system dynamics relies on the correct EM-field (Eq. 1g), which is dictated by the electric current J (Eq. 1f). Figure 4 shows that our model can predict the evolution of J contribution from the non-thermal population accurately over 50 time steps of rollout. This is further illustrated in Fig. 5 , where for the different contributions of J, good agreement between the predictions of our model and the ground truth is observed. Similarly, our model predictions for components of the E-field and the thermal fluid density match well the ground truth after 20 steps of rollout. These results demonstrate faithful evolution of the important quantities that describe the dynamics of the fluid, the non-thermal particles, and the coupling between them. Importantly, our model is able to accurately capture the critical aspect of laser-plasma interactions -non-thermal particle acceleration (Fig. 6 ). Note that the highly nontrivial energy distribution of the accelerated (non-thermal) particles cannot be properly described with a few fluid moments and must rely on the accurate evolution of the particle representation. 

6. DISCUSSION AND CONCLUSION

This work helps address a key challenge in modeling multi-scale, non-equilibrium, N-body systems by providing a novel hybrid particle-continuum model, whose coupling is learned from the data of first-principles simulations. LHPC outperforms the classical first-principles solver by an order of magnitude in speed, and the baseline method that models the entire system as a fluid by up to an order of magnitude in accuracy. Further speed-ups are expected in multiple dimensions as the non-thermal population constitutes a much smaller fraction of the system representation (see Appendix A.4). Our method can be extended towards finding more efficient and accurate particle-continuum descriptions across different domains involving non-equilibrium N-body systems, and represents a key step towards the development of advanced data-driven algorithms for multi-scale modeling of nonlinear physical systems.

A.3 SIMULATION PARAMETERS

In our PIC simulations we consider an intense laser interacting with a planar, solid-density target. In our 1D simulations a laser with frequency ω 0 is launched along the x direction from the left boundary and irradiates an electron-proton plasma (i.e. m i = m p = 1836 m e ). The laser pulse duration is 15 fs, with intensity 2 a 0 20. The plasma density follows a step-like profile with thickness 5 µm and electron number density 10 23 cm -3 (considering a laser wavelength of 1µm), corresponding to a solid-density target such as a metal foil. The target is simulated with 1000 particles per cell per species, and the total simulation domain of 80 µm is resolved with a spatial resolution (cell size) of 0.03 c/ω 0 . The time step is chosen according to the Courant-Friedrichs-Lewy condition, and the system is evolved over 2000 time steps (or 8500 for a 0 = 20). Periodic boundary conditions for both particles and fields are used.

A.4 COMPUTATIONAL CHALLENGES AND OPPORTUNITIES

Studying these laser-plasma interactions using the particle-in-cell (PIC) method requires resolving the fastest and smallest oscillations of the electrons in the plasmas. As a consequence, modeling these interactions with first-principles simulations is very computational demanding. For example, a 3D one-to-one simulation of laser-plasma interactions typically involves evolving the dynamics of billions of (numerical) particles on a billion-cell grid, requiring millions of CPU hours to compute. In this work, we consider only one spatial dimension x, while retaining the three dimensions in momentum. This is identical to simulating a particle distribution uniform in the y and z directions. See Appendix A for details of laser-plasma interactions and the PIC method. It is important to note that in more realistic 2D and 3D geometries a significant enhancement of speed-up can be achieved, due to the much smaller proportion of the non-thermal particles. This is a result of the finite spot size of the laser, which will only interact with a finite volume of the plasma, and accelerate a small fraction of the particles to non-thermal. In a 3D simulation of these laser-plasma interactions, the non-thermal particles compose typically < 0.1% of the total number of particles while encompassing > 50% of the system energy. Compared with the 1D simulations presented in this work ( 20%), one expects a speed-up of > 1000x in 3D.

B DETAILS FOR SEPARATION METHOD

Here we describe the details of the separation method for preparing the labeled (thermal and nonthermal) data. The algorithm separates the particles (from the original simulations) into thermal and non-thermal populations. This is done by computing locally the moments of the particle velocity distribution f (v) and considering non-thermal particles those with velocities v that exceed a threshold αv th -a given multiple (α) of the thermal velocity v th (= the velocity spread σ v for non-relativistic velocities; Cohen et al. 2010; Fiuza et al. 2011) . At each time step, we study the velocity distribution function of the particles in the local neighborhood in space N i of each particle P i . Without knowing which particles belong to the thermal population a priori, we begin by identifying a population that is likely thermal at iteration 1. The velocity spread σ v1 computed from this population provides a first estimate of the threshold ασ v , allowing us remove particles with v > ασ v from the distribution. The procedure is repeated (on the updated distribution) until a convergence of the value of the threshold is reached at iteration N . The remaining particles now constitute the thermal population, characterized by the fluid velocity v fl = v j ∀ j ∈ N and thermal velocity v th = σ v N ( • • • denotes the average value). These are the mean and sigma of the Gaussian. The corresponding threshold αv th is then used to determine the population P i belongs to. Note that only P i has been given a label, the other particles in N i were only used to calculate v fl and v th . Figure 7 illustrates an iteration in detail. From the example velocity distribution function shown in Fig. 7 (a), we recognize an absolute peak located at v abs with height h abs . Among the peaks (v µ , h µ ) ∀ h µ Xh abs for a constant X, pick the one with v µ closest to zero as the first guess of thermal population and denote the peak location as (v 0 , h 0 ) (Fig. 7b ). Search down and outwards from (v 0 , h 0 ) (i.e. in both directions of v with v + > v 0 > v -) until the heights 7c ). The population enclosed by {v|v + > v > v -} is regarded as the population to estimate the threshold. Transform all the particles in N into the local fluid frame according to v fl (Fig. 7d ; for non-relativistic velocities this amounts to v i → v i -v fl ). Calculate σ v to obtain the threshold (Fig. 7e ). h + ≡ h(v = v + ) = Y h 0 and h -≡ h(v = v -) = Y h 0 (Fig. The size of N i , values of X, Y , and α are tunable parameters determined by the physics and finetuned empirically. We have found that N i 60 cells, X = 0.8, Y = 0.1, and α = 3 and 5 for the water and laser-plasma systems lead to the most physically accurate separation. The accuracy of the algorithm is evaluated by the pressure tensor P ij of the thermal population, where P ij = v i v j -v i v j characterizes the second-order velocity moments and a is directly related to how well the distribution is described by a Gaussian. Figure 7 (f) exemplifies that indeed, for the identified thermal population of the ions for the laser-plasma system, the diagonal terms are all comparable and the off-diagonal terms are negligibly small. Namely, the velocity distribution can be described by a Gaussian. For both M or M NT , there's 20 moment directional components that needs to be predicted -10 for ions, 10 for electrons -of the zero (1), first (3), and second (6) moment orders. Thus, for both components, we instantiate 20 models with the same hyperparameter configuration in f M or f MNT . We also train Component (1) with the push-forward trick and multi-step loss, shown in Brandstetter et al. (2022) to greatly enhance capability of generalization.

C.1.2 COMPONENT (C)

In order to parameterize the distribution to sample, a three-dimensional Gaussian is taken to have mean E[u] ≈ the first order moments in M t+1 NT , and covariance matrix E [(u -E[u])(u -E[u]) T ] ∈ R 3×3 informed by the second order moments in M t+1 NT . To be able to sample, we require a valid (positive definite) covariance matrix. In order to enforce this condition, we obtain the ground-truth L using the Cholesky decomposition E [(u -E[u])(u -E[u]) T ] = LL T and train f MNT to predict L. During rollout, we approximate E[(u -E[u])(u -E[u] ) T ] ≈ L LT , and use the valid covariance matrix to sample a set of particles which is then "injected". The cumulative set of injected particles constitute the PIC loop.

C.1.3 COMPONENT (D)

After obtaining the sample of injected particles, we subtract their moments from M t+1 to obtain the thermal population's moments M t+1 for the next time step. C.1.4 COMPONENTS (E) (F) (G) Each of these components by default use the respective components from the imitation solver. The imitation solver implements the equations used in the OSIRIS PIC simulation, and is tested to match the simulation to machine precision error on any single timestep. These components run efficiently because we now only advance the much smaller sub-population of non-thermal particles (x, u) t+1 . Figure 8 illustrates the full pipeline of our method, concretized into the laser-plasma simulation. 

D ADDITIONAL RESULTS FOR MULTI-SCALE LASER-PLASMA INTERACTION

Here we provide additional results for our experiments, as shown in Tables 2 and 3 . Table 2 uses the same dataset split as Table 1 , but focus on rollout starting from t = 1200, where there is most injection and dynamical activity. From the table, we see that the conclusion is similar as Table 1 in main text: LHPC achieves significant error reduction for key quantities of interest: EM field and MPIC (energetic particles). To ablate different aspect of our model, we perform an additional experiment with a different train: test split, where the test dataset is in the future of the training (Table 3 ). We take the 10 th trajectory with the highest laser intensity that consists of T = 8480 time steps, where the training time-range is [1000, 1900] and the testing (rollout) time-range is 1900 onwards. We explore an ablation study on various strategies to use the additional data [1800, 1900] . Namely, we compare finetuning with simply retraining the model on [1000, 1900] . We also use the multi-step loss with the push-forward trick, reported by Brandstetter et al. (2022) and other works to enhance the generalization of the model, and because it can accurately mimic the input distribution during rollout.

E ADDITIONAL DETAILS FOR ARCHITECTURE AND TRAINING

We use a convolutional network (CNN) as the base architecture for modeling fluid network g fluid,θ (Fig. 9 ) and injection network g inject,ϕ . For both networks, we use the same 6 layers, with the same Method Component Error @ step 1 Error @ step 10 Error @ step 20 Error @ step 50 Speed (s/step) GT Solver (full PIC) Table 3 : Ablation study for multi-physics laser-plasma interactions for the 10 th dataset that has highest laser intensity (a 0 = 20). We compare amongst: (a) train only single-step model from [1000, 1800] vs. (b) from [1000, 1900] vs. (c) taking (a) and finetuning with single-step loss on [1800, 1900] vs (d) taking (a) and finetuning with multi-step push-forward trick loss on [1800, 1900] . We compare with the All-fluid baseline, and report result from the best hyperparameter configuration. Each cell reports the relative L2 error for the EM-field [E t , B t ], for M t . kernel sizes of 1,1,3,7,3,1, with feature sizes of 64, allowing feature extraction and local information exchange among neighboring cells. For each feature of the moments, we use a different feature head as shown in Fig. 9 . Training. We use Adam (Kingma & Ba, 2014) optimizer, with starting learning rate of 10 -3 . The training consists of two stages. In the first stage, we train both g M,θ and g M N T ,ϕ with single-step loss: L 1 = E t ( M t+1 , M t+1 ) + ( M t+1 NT , M t+1 NT ) In the second stage, we fine-tune with predicting N = 4 steps into the future with push-forward trick (Brandstetter et al. 2022 that stops the gradient on the input. The loss is given by: L 2 = E t N i=1 α i ( M t+i , M t+i ) + N i=1 α i ( M t+i NT , M t+i NT ) For both losses, we have



t e x i t s h a 1 _ b a s e 6 4 = " b g H I 6 u D Y E 3O 8 v H J U i r z Y 1 s n K U V 4 = " > A A A B + H i c b V D L S s N A F J 3 U V 6 2 P R l 2 6 C R b B V U l E 0 W X B j b i q Y B / Q h j C Z 3 r R j J 5 M w c y P W 0 i 9 x 4 0 I R t 3 6 K O / / G a Z u F t h 4 Y O J x z L 3 P P C V P B N b r u t 1 V Y W V 1 b 3 y h u l r a 2 d 3 b L 9 t 5 + U y e Z Y t B g i U h U O 6 Q a B J f Q Q I 4 C 2 q k C G o c C W u H w a u q 3 H k B p n s g 7 H K X g x 7 Q v e c Q Z R S M F d r k f d B E e c c z l P T C c B H b F r b o z O M v E y 0 m F 5 K g H 9 l e 3 l 7 A s B o l M U K 0 7 n p u i P 6 Y K O R M w K X U z D S l l Q 9 q H j q G S xq D 9 8 e z w i X N s l J 4 T J c o 8 i c 5 M / b 0 x p r H W o z g 0 k z H F g V 7 0 p u J / X i f D 6 N I 3 m d I M Q b L 5 R 1 E m H E y c a Q t O j y u T V o w M o U x

t e x i t s h a 1 _ b a s e 6 4 = " b g H I 6 u D Y E 3O 8 v H J U i r z Y 1 s n K U V 4 = " > A A A B + H i c b V D L S s N A F J 3 U V 6 2 P R l 2 6 C R b B V U l E 0 W X B j b i q Y B / Q h j C Z 3 r R j J 5 M w c y P W 0 i 9 x 4 0 I R t 3 6 K O / / G a Z u F t h 4 Y O J x z L 3 P P C V P B N b r u t 1 V Y W V 1 b 3 y h u l r a 2 d 3 b L 9 t 5 + U y e Z Y t B g i U h U O 6 Q a B J f Q Q I 4 C 2 q k C G o c C W u H w a u q 3 H k B p n s g 7 H K X g x 7 Q v e c Q Z R S M F d r k f d B E e c c z l P T C c B H b F r b o z O M v E y 0 m F 5 K g H 9 l e 3 l 7 A s B o l M U K 0 7 n p u i P 6 Y K O R M w K X U z D S l l Q 9 q H j q G S xq D 9 8 e z w i X N s l J 4 T J c o 8 i c 5 M / b 0 x p r H W o z g 0 k z H F g V 7 0 p u J / X i f D 6 N I 3 m d I M Q b L 5 R 1 E m H E y c a Q t O j y u T V o w M o U x

W z 2 b R L N 5 u w O 1 F L 6 P / w 4 k E R r / 4 X b / 4 b t 2 0 O 2 v p g 4 P H e D D P z/ E R w j Y 7 z b R V W V t f W N 4 q b p a 3 t n d 2 9 8 v 5 B S 8 e p o q x J Y x G r j k 8 0 E 1 y y J n I U r J M o R i J f s L Y / u p r 6 7 Q e m N I / l H Y 4 T 5 k V k I H n I K U E j 3 Q / 6 P W R P m I U i 5 c G k X 6 4 4 V W c G e 5 m 4 O a l A j k a / / N U L Y p p G T C I V R O u u 6 y T o Z U Q h p 4 J N S r 1 U s 4 T Q E R m w r q G S R E x 7 2 e z q i X 1 i l M A O Y 2 V K o j 1 T f 0 9 k J N J6 H P m m M y I 4 1 I v e V P z P 6 6 Y Y X n o Z l 0 m K T N L 5 o j A V N s b 2 N A I 7 4 I p R F G N D C F X c 3 G r T I

y u 8 O d p 5 c d 6 d j 8 V o w c l 3 j u E P n M 8 f / a q R K Q = = < / l a t e x i t > u < l a t e x i t s h a 1 _ b a s e 6 4 = " o 4 u D T 5 G 0 z T H J L Z r w I T D N 6 T p I

y u 8 O d p 5 c d 6 d j 8 V o w c l 3 j u E P n M 8 f / a q R K Q = = < / l a t e x i t > u < l a t e x i t s h a 1 _ b a s e 6 4 = " o 4 u D T 5 G 0 z T H J L Z r w I T D N 6 T p I

Figure 1: (a) Schematic of our LHPC architecture. It consists of three components: (1) a neural network g fluid for evolution of thermal sub-population with fluid representation; (2) a solver g * for the evolution of the non-thermal energetic sub-population with particle representation; (3) neural networks that model the injection of particles from fluid state (g inject ). (b) and (c) illustrate the architectures of the components (1) and (3).

Figure 2: (a) Intense laser-plasma interactions generate energetic particles with rich dynamics. (b) Projection of the particle dynamics onto the longitudinal momentum (u x )-position (x) phase spaces, the laser E-field (orange), and the corresponding energy spectra of the particles at two different times.

[980, 1000] ∪ [1080, 1100] ∪ [1180, 1200] ∪ [1280, 1300] ∪ [1380, 1400],

Figure 5: Comparison of ground truth (Full-PIC) and LHPC model predictions after 20 steps of rollout, for E-field components (left column) E x , E y , (middle) total density of the fluid n, and J contribution from the fluid electrons J e , (right) non-thermal ions J iNT and electrons J eNT .

Figure 7: The separation method identifies the thermal and non-thermal populations from the local velocity distribution of particles.

Figure8: Schematic of LHPC for the multi-scale laser-plasma interaction, which is the application of the general pipeline (Fig.1) to the current problem. The components correspond almost exactly to Eqs. 1a -1g except current deposit and field advance are separated.

Results for laser-plasma interactions for time range t = 1200-1250. Trajectory 5 (a 0 = 10) is held out for testing, and the model is trained on the other 9 datasets. We report performance on Dataset 5 and rollout at t = 1200, where there is the most injection and dynamical activity. As shown, the model remains stable within this chaotic, unseen time range, showing generalization to an unseen trajectory. 32E-03, 6.46E-03) (2.57E-02, 2.07E-02) (4.02E-02, 3.60E-02) (4.15E-02, 7.11E-02) (7.10E-02, 1.19E-01) 1.64E-01

A FIRST-PRINCIPLES SIMULATIONS OF LASER-PLASMA INTERACTIONS

A plasma is a collection of unbound, moving ions and electrons, interacting through the electromagnetic (EM-) fields. It can be produced through for example the irradiation of a high-intensity (I > 10 18 W/cm 2 ) laser pulse onto a solid material such as a metal foil. Through the nonlinear interplay between the electrons, ions, and the EM-fields, these laser-plasma interactions can generate high-energy (> MeV) ions for applications in material science (Patel et al., 2003) , imaging (Rygg et al., 2008) , and medical therapy (Kroll et al., 2022) . To understand these nonlinear processes, it is necessary to use kinetic treatment of the plasma.A.1 PARTICLE AND CONTINUUM DESCRIPTIONS OF PLASMAS One example of kinetic treatment of plasma is the particle-in-cell (PIC) method (Dawson, 1983; Birdsall & Langdon, 1991) . In this method we solve the Klimontovich equation (Klimontovich 1967) for finite size particles, coupled to Maxwell's equations for the EM-fields. The numerical procedure consists of solving Maxwell's equations on a spatial grid using the current and charge densities that are obtained by weighting the discrete plasma particles onto the grid. The particles are then advanced via the Lorentz force associated with the EM-fields. This particle-based simulation technique captures the kinetic microphysics of plasmas, and to the extent that quantum mechanical effects can be neglected, provides a first-principles description of plasma dynamics.A plasma can also be described exactly as a continuum, by fluid equations -the evolution of the velocity moments of the distribution function f of the plasma particles (M N (xis the N -th order velocity moment). The fluid equations can be derived from the kinetic equations and form an infinite hierarchy of exact coupled conservation equations for each fluid moment (with the n-th moment depending explicitly on the (n + 1)-th). In practice, this infinite hierarchy needs to be truncated after the first few moments, through the so-called closure relation -a relation that expresses the evolution of the highest-order moment considered in terms of the lower-order moments.For a plasma near local thermodynamic equilibrium, such as the thermal population of the electrons and ions studied in our work, these closures allow us to describe the plasma self-consistently at larger spatial and temporal scales (relative to the PIC scales), using a finite number of moments. Indeed, we use the first three moments -the mass, momentum, and energy densities of the particles -to describe the dynamics of the thermal fluids, and learn their coupling with the kinetic (non-thermal) particles and the EM-fields.

A.2 LASER-PLASMA INTERACTIONS

The generation of high-energy ion beams from intense laser-solid interactions has been an active area of research due to the potential of producing high-energy, high-charge, high-current ion beams in much more compact systems than solid-state based accelerators (e.g. linear accelerators, cyclotrons). The enormous accelerating gradients (TV/m; ∼5 orders of magnitude larger than solid-state based accelerators) can be sustained in a plasma (Wilks et al., 2001) , accelerating ions to MeV energies within millimeters.For these laser intensities it is useful to define the normalized vector potential a 0 0.85 I[W/cm 2 ](λ 0 [µm]) 2 /10 18 , where λ 0 is the laser wavelength. The laser electric field can accelerate electrons to relativistic speeds in one cycle if a 0 > 1. For ion acceleration, a solid-density target is typically used, for which the laser E-field accelerates the electrons in the small, skin depth, layer near the front surface, producing very energetic electrons. In addition to the acceleration near the front surface, these energetic electrons cross the dense target and escape into the vacuum on the rear side, setting up a strong charge separation E-field that accelerates the ions from the back surface. As a result, the ions will be laminar and possess small divergence angle, and exhibit an energy spectrum typically characterized by an exponentially decreasing distribution (Snavely et al., 2000; Wilks et al., 2001) .M t+i NT = g MNT,ϕ sg( M t+i-1 ), sg( M t+i-1 ) , i = 1, 2, ...N (5)Notice the hat notation • in the input arguments, denoting that they are autoregressive prediction of the LHPC at the previous time step, but the gradient is stopped (the "sg" notation). Essentially, we rollout the pipeline of LHPC for multiple steps, and use it to provide a realistic input that contains the rollout error, requiring the model to not only predict well, but able to adapt to noise due to the rollout error. We find that this significantly improve the long-term prediction performance. For the coefficient of the multi-step loss in Eq. 4, we set (α 1 , α 2 , α 3 , α 4 ) = (1, 0.5, 0.1, 0.1) with decreasing weight for longer time steps. This put more emphasis on the single-step loss to make the training more stable, and also have weight on longer-term future to improve long-term prediction.For the loss function (•, •) in Eq. 3 and 4, we useWe find that this achieves a better performance than the alternative of MSE loss (with exponent of 2) and MAE loss (with exponent of 1). MSE will give very small gradient if the loss is small, and not able to encourage that the prediction to be exactly 0 in the vacuum. On the other hand, MAE is harder to train and do not penalize more for larger errors. Our choice of loss function (•, •) strikes a good balance between the two and enjoys the benefit of both loss functions.For single-step and multi-step training, we both train 500 epochs, with and cosine learning rate scheduling (Loshchilov & Hutter, 2016) . 

