NEURAL TIME-DEPENDENT PARTIAL DIFFERENTIAL EQUATION

Abstract

Partial differential equations (PDEs) play a crucial role in studying a vast number of problems in science and engineering. Numerically solving nonlinear and/or highdimensional PDEs is frequently a challenging task. Inspired by the traditional finite difference and finite elements methods and emerging advancements in machine learning, we propose a sequence-to-sequence learning (Seq2Seq) framework called Neural-PDE, which allows one to automatically learn governing rules of any timedependent PDE system from existing data by using a bidirectional LSTM encoder, and predict the solutions in next n time steps. One critical feature of our proposed framework is that the Neural-PDE is able to simultaneously learn and simulate all variables of interest in a PDE system. We test the Neural-PDE by a range of examples, from one-dimensional PDEs to a multi-dimensional and nonlinear complex fluids model. The results show that the Neural-PDE is capable of learning the initial conditions, boundary conditions and differential operators defining the initial-boundary-value problem of a PDE system without the knowledge of the specific form of the PDE system. In our experiments, the Neural-PDE can efficiently extract the dynamics within 20 epochs training and produce accurate predictions. Furthermore, unlike the traditional machine learning approaches for learning PDEs, such as CNN and MLP, which require great quantity of parameters for model precision, the Neural-PDE shares parameters among all time steps, and thus considerably reduces computational complexity and leads to a fast learning algorithm.

1. INTRODUCTION

The research of time-dependent partial differential equations (PDEs) is regarded as one of the most important disciplines in applied mathematics. PDEs appear ubiquitously in a broad spectrum of fields including physics, biology, chemistry, and finance, to name a few. Despite their fundamental importance, most PDEs can not be solved analytically and have to rely on numerical solving methods. Developing efficient and accurate numerical schemes for solving PDEs, therefore, has been an active research area over the past few decades (Courant et al., 1967; Osher & Sethian, 1988; LeVeque; Cockburn et al., 2012; Thomas, 2013; Johnson, 2012) . Still, devising stable and accurate schemes with acceptable computational cost is a difficult task, especially when nonlinear and(or) high-dimensional PDEs are considered. Additionally, PDE models emerged from science and engineering disciplines usually require huge empirical data for model calibration and validation, and determining the multidimensional parameters in such a PDE system poses another challenge (Peng et al., 2020) . Deep learning is considered to be the state-of-the-art tool in classification and prediction of nonlinear inputs, such as image, text, and speech (Litjens et al., 2017; Devlin et al., 2018; LeCun et al., 1998; Krizhevsky et al., 2012; Hinton et al., 2012) . Recently, considerable efforts have been made to employ deep learning tools in designing data-driven methods for solving PDEs (Han et al., 2018; Long et al., 2018; Sirignano & Spiliopoulos, 2018; Raissi et al., 2019) . Most of these approaches are based on fully-connected neural networks (FCNNs), convolutional neural networks(CNNs) and multilayer perceptron (MLP). These neural network structures usually require an increment of the layers to improve the predictive accuracy (Raissi et al., 2019) , and subsequently lead to a more complicated model due to the additional parameters. Recurrent neural networks (RNNs) are one type of neural network architectures. RNNs predict the next time step value by using the input data from the current and previous states and share parameters across all inputs. This idea (Sherstinsky, 2020) of using current and previous step states to calculate the state at the next time step is not unique to RNNs. In fact, it is ubiquitously used in numerical PDEs. Almost all time-stepping numerical methods applied to solve time-dependent PDEs, such as Euler's, Crank-Nicolson, high-order Taylor and its variance Runge-Kutta (Ascher et al., 1997) time-stepping methods, update numerical solution by utilizing solution from previous steps. This motivates us to think what would happen if we replace the previous step data in the neural network with numerical solution data to PDE supported on grids. It is possible that the neural network behaves like a time-stepping method, for example, forward Euler's method yields the numerical solution at a new time point as the current state output (Chen et al., 2018) . Since the numerical solution on each of the grid point (for finite difference) or grid cell (for finite element) computed at a set of contiguous time points can be treated as neural network input in the form of one time sequence of data, the deep learning framework can be trained to predict any time-dependent PDEs from the time series data supported on some grids if the bidirectional structure is applied (Huang et al., 2015; Schuster & Paliwal, 1997) . In other words, the supervised training process can be regarded as a practice of the deep learning framework to learn the numerical solution from the input data, by learning the coefficients on neural network layers. Long Short-Term Memory (LSTM) (Hochreiter & Schmidhuber, 1997 ) is a neural network built upon RNNs. Unlike vanilla RNNs, which suffer from losing long term information and high probability of gradient vanishing or exploding, LSTM has a specifically designed memory cell with a set of new gates such as input gate and forget gate. Equipped with these new gates which control the time to preserve and pass the information, LSTM is capable of learning long term dependencies without the danger of having gradient vanishing or exploding. In the past two decades, LSTM has been widely used in the field of natural language processing (NLP), such as machine translation, dialogue systems, question answering systems (Lipton et al., 2015) . Inspired by numerical PDE schemes and LSTM neural network, we propose a new deep learning framework, denoted as Neural-PDE. It simulates multi-dimensional governing laws, represented by time-dependent PDEs, from time series data generated on some grids and predicts the next n time steps data. The Neural-PDE is capable of intelligently processing related data from all spatial grids by using the bidirectional (Schuster & Paliwal, 1997) neural network, and thus guarantees the accuracy of the numerical solution and the feasibility in learning any time-dependent PDEs. The detailed structures of the Neural-PDE and data normalization are introduced in Section 3. The rest of the paper is organized as follows. Section 2 briefly reviews finite difference method for solving PDEs. Section 3 contains detailed description of designing the Neural-PDE. In Section 4 and Appendix A of the paper, we apply the Neural-PDE to solve four different PDEs, including the 1-dimensional(1D) wave equation, the 2-dimensional(2D) heat equation, and two systems of PDEs: the invicid Burgers' equations and a coupled Navier Stokes-Cahn Hilliard equations, which widely appear in multiscale modeling of complex fluid systems. We demonstrate the robustness of the Neural-PDE, which achieves convergence within 20 epochs with an admissible mean squared error, even when we add Gaussian noise in the input data.

2.1. TIME DEPENDENT PARTIAL DIFFERENTIAL EQUATIONS

A time-dependent partial differential equation is an equation of the form: u t = f (x 1 , • • • , u, ∂u ∂x 1 , • • • , ∂u ∂x n , ∂ 2 u ∂x 1 ∂x 1 , • • • , ∂ 2 u ∂x 1 ∂x n , • • • , ∂ n u ∂x 1 • • • ∂x n ) , (2.1.1) where u = u(t, x 1 , ..., x n ) is known, x i ∈ R are spatial variables, and the operator f maps R → R. For example, consider the parabolic heat equation: u t = α 2 ∆u, where u represents the temperature and f is the Laplacian operator ∆. Eq. (2.1.1) can be solved by finite difference methods, which is briefly reviewed below for the self-completeness of the paper.

2.2. FINITE DIFFERENCE METHOD

Consider using a finite difference method (FDM) to solve a two-dimensional second-order PDE of the form: u t = f (x, y, u x , u y , u xx , u yy ), (x, y) ∈ Ω ⊂ R 2 , t ∈ R + ∪ {0} , (2.2.1) with some proper boundary conditions. Let Ω be Ω = [x a , x b ] × [y a , y b ], and u n i,j = u(x i , y j , t n ) (2.2.2) where t n = nδt, 0 ≤ n ≤ N , and δt = T N for t ∈ [0, T ], and some large integer N . The central difference method approximates the spatial derivatives as follows (Thomas, 2013): x i = iδx, 0 ≤ i ≤ N x , δx = xa-x b Nx for x ∈ [x a , x b ]. u x (x i , y j , t) = 1 2δx (u i+1,j -u i-1,j ) + O(δx 2 ) , (2.2.3) u y (x i , y j , t) = 1 2δy (u i,j+1 -u i,j-1 ) + O(δy 2 ) , (2.2.4) u xx (x i , y j , t) = 1 δx 2 (u i+1,j -2u i,j + u i-1,j ) + O(δx 2 ) , (2.2.5) u yy (x i , y j , t) = 1 δy 2 (u i,j+1 -2u i,j + u i,j-1 ) + O(δy 2 ) . (2.2.6) To this end, the explicit time-stepping scheme to update next step solution u n+1 is given by: u n i,j ≈ U n+1 i,j = U n i,j + δtf (x i , y j , U n i,j , U n i,j-1 , U n i,j+1 , U n i+1,j , U n i-1,j ) , (2.2.7) ≡ F(x i , y j , δx, δy, δt, U n i,j , U n i,j-1 , U n i,j+1 , U n i+1,j , U n i-1,j ) , (2.2.8) " " + 1 " -1 & + 1 & -1 & u n+1 i,j < l a t e x i t s h a 1 _ b a s e 6 4 = " L f U q l B s P v 2 b 7 N K 9 h a E o F Q W C p S j c = " > A A A B 9 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B U E r S C n o s e v F Y w X 5 A m 5 b N d t O u 3 W z C 7 k Y p I f / D i w d F v P p f v P l v 3 L Y 5 a O u D g c d 7 M 8 z M 8 y L O l L b t b y u 3 s r q 2 v p H f L G x t 7 + z u F f c P m i q M J a E N E v J Q t j 2 s K G e C N j T T n L Y j S X H g c d r y x j d T v / V I p W K h u N e T i L o B H g r m M 4 K 1 k X p x P 2 H n 6 C H t J e L M S f v F k l 2 2 Z 0 D L x M l I C T L U + 8 W v 7 i A k c U C F J h w r 1 X H s S L s J l p o R T t N C N 1 Y 0 w m S M h 7 R j q M A B V W 4 y u z p F J 0 Y Z I D + U p o R G M / X 3 R I I D p S a B Z z o D r E d q 0 Z u K / 3 m d W P t X b s J E F G s q y H y R H 3 O k Q z S N A A 2 Y p E T z i S G Y S G Z u R W S E J S b a B F U w I T i L L y + T Z q X s V M u V u 4 t S 7 T q L I w 9 H c A y n 4 M A l 1 O A W 6 t A A A h K e 4 R X e r C f r x X q 3 P u a t O S u b O Y Q / s D 5 / A N j F k h U = < / l a t e x i t > u n i,j < l a t e x i t s h a 1 _ b a s e 6 4 = " r N N f 8 i H t N w n K V p t C U V c 7 B f K 8 + 0 0 = " > A A A B 8 3 i c b V D L S g N B E O y N r x h f U Y 9 e B o P g Q c J u F P Q Y 9 O I x g n l A s o b Z y W w y Z n Z 2 m Y c Q l v 0 N L x 4 U 8 e r P e P N v n C R 7 0 M S C h q K q m + 6 u I O F M a d f 9 d g o r q 2 v r G 8 X N 0 t b 2 z u 5 e e f + g p W I j C W 2 S m M e y E 2 B F O R O 0 q Z n m t J N I i q O A 0 3 Y w v p n 6 7 S c q F Y v F v Z 4 k 1 I / w U L C Q E a y t 1 D P 9 l J 2 h x + w h F V m / X H G r 7 g x o m X g 5 q U C O R r / 8 1 R v E x E R U a M K x U l 3 P T b S f Y q k Z 4 T Q r 9 Y y i C S Z j P K R d S w W O q P L T 2 c 0 Z O r H K A I W x t C U 0 m q m / J 1 I c K T W J A t s Z Y T 1 S i 9 5 U / M / r G h 1 e + S k T i d F U k P m i 0 H C k Y z Q N A A 2 Y p E T z i S W Y S G Z v R W S E J S b a x l S y I X i L L y + T V q 3 q n V d r d x e V + n U e R x G O 4 B h O w Y N L q M M t N K A J B B J 4 h l d 4 c 4 z z 4 r w 7 H / P W g p P P H M I f O J 8 / + m W R p Q = = < / l a t e x i t > u n i 1,j < l a t e x i t s h a 1 _ b a s e 6 4 = " K z K T B u F 3 e B V 1 w 9 n Y x W L z 2 6 7 z T f 0 = " > A A A B 9 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B g 5 a k F f R Y 9 O K x g v 2 A N i 2 b 7 a Z d u 9 m E 3 Y 1 S Q v 6 H F w + K e P W / e P P f u G 1 z 0 N Y H A 4 / 3 Z p i Z 5 0 W c K W 3 b 3 1 Z u Z X V t f S O / W d j a 3 t n d K + 4 f N F U Y S 0 I b J O S h b H t Y U c 4 E b W i m O W 1 H k u L A 4 7 T l j W + m f u u R S s V C c a 8 n E X U D P B T M Z w R r I / X i f s L O n T P 0 k P Y S k f a L J b t s z 4 C W i Z O R E m S o 9 4 t f 3 U F I 4 o A K T T h W q u P Y k X Y T L D U j n K a F b q x o h M k Y D 2 n H U I E D q t x k d n W K T o w y Q H 4 o T Q m N Z u r v i Q Q H S k 0 C z 3 Q G W I / U o j c V / / M 6 s f a v 3 I S J K N Z U k P k i P + Z I h 2 g a A R o w S Y n m E 0 M w k c z c i s g I S 0 y 0 C a p g Q n A W X 1 4 m z U r Z q Z Y r d x e l 2 n U W R x 6 O 4 B h O w Y F L q M E t 1 K E B B C Q 8 w y u 8 W U / W i / V u f c x b c 1 Y 2 c w h / Y H 3 + A N l v k h c = < / l a t e x i t > u n i+1,j < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 Z i x / m c / w z S y o G b s o C U J Q I P S 9 s w = " > A A A B 9 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B U E r S C n o s e v F Y w X 5 A m 5 b N d t O u 3 W z C 7 k Y p I f / D i w d F v P p f v P l v 3 L Y 5 a O u D g c d 7 M 8 z M 8 y L O l L b t b y u 3 s r q 2 v p H f L G x t 7 + z u F f c P m i q M J a E N E v J Q t j 2 s K G e C N j T T n L Y j S X H g c d r y x j d T v / V I p W K h u N e T i L o B H g r m M 4 K 1 k X p x P 2 F n z j l 6 S H u J S P v F k l 2 2 Z 0 D L x M l I C T L U + 8 W v 7 i A k c U C F J h w r 1 X H s S L s J l p o R T t N C N 1 Y 0 w m S M h 7 R j q M A B V W 4 y u z p F J 0 Y Z I D + U p o R G M / X 3 R I I D p S a B Z z o D r E d q 0 Z u K / 3 m d W P t X b s J E F G s q y H y R H 3 O k Q z S N A A 2 Y p E T z i S G Y S G Z u R W S E J S b a B F U w I T i L L y + T Z q X s V M u V u 4 t S 7 T q L I w 9 H c A y n 4 M A l 1 O A W 6 t A A A h K e 4 R X e r C f r x X q 3 P u a t O S u b O Y Q / s D 5 / A N Z V k h U = < / l a t e x i t > u n i,j 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 l h z S 3 k L o 2 3 + Z d K B 9 m j I U i + w y F o = " > A A A B 9 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B g 5 a k F f R Y 9 O K x g v 2 A N i 2 b 7 a Z d u 9 m E 3 Y 1 S Q v 6 H F w + K e P W / e P P f u G 1 z 0 N Y H A 4 / 3 Z p i Z 5 0 W c K W 3 b 3 1 Z u Z X V t f S O / W d j a 3 t n d K + 4 f N F U Y S 0 I b J O S h b H t Y U c 4 E b W i m O W 1 H k u L A 4 7 T l j W + m f u u R S s V C c a 8 n E X U D P B T M Z w R r I / X i f s L O 0 M O 5 k / Y S k f a L J b t s z 4 C W i Z O R E m S o 9 4 t f 3 U F I 4 o A K T T h W q u P Y k X Y T L D U j n K a F b q x o h M k Y D 2 n H U I E D q t x k d n W K T o w y Q H 4 o T Q m N Z u r v i Q Q H S k 0 C z 3 Q G W I / U o j c V / / M 6 s f a v 3 I S J K N Z U k P k i P + Z I h 2 g a A R o w S Y n m E 0 M w k c z c i s g I S 0 y 0 C a p g Q n A W X 1 4 m z U r Z q Z Y r d x e l 2 n U W R x 6 O 4 B h O w Y F L q M E t 1 K E B B C Q 8 w y u 8 W U / W i / V u f c x b c 1 Y 2 c w h / Y H 3 + A N n B k h c = < / l a t e x i t > u n i,j+1 < l a t e x i t s h a 1 _ b a s e 6 4 = " g R S o F a 0 3 3 G / 0 B 1 d X I j h g 0 X F v t 0 I = " > A A A B 9 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B Z B U E r S C n o s e v F Y w X 5 A m 5 b N d t O u 3 W z C 7 k Y p I f / D i w d F v P p f v P l v 3 L Y 5 a O u D g c d 7 M 8 z M 8 y L O l L b t b y u 3 s r q 2 v p H f L G x t 7 + z u F f c P m i q M J a E N E v J Q t j 2 s K G e C N j T T n L Y j S X H g c d r y x j d T v / V I p W K h u N e T i L o B H g r m M 4 K 1 k X p x P 2 H n 6 O H M S X u J S P v F k l 2 2 Z 0 D L x M l I C T L U + 8 W v 7 i A k c U C F J h w r 1 X H s S L s J l p o R T t N C N 1 Y 0 w m S M h 7 R j q M A B V W 4 y u z p F J 0 Y Z I D + U p o R G M / X 3 R I I D p S a B Z z o D r E d q 0 Z u K / 3 m d W P t X b s J E F G s q y H y R H 3 O k Q z S N A A 2 Y p E T z i S G Y S G Z u R W S E J S b a B F U w I T i L L y + T Z q X s V M u V u 4 t S 7 T q L I w 9 H c A y n 4 M A l 1 O A W 6 t A A A h K e 4 R X e r C f r x X q 3 P u a t O S u b O Y Q / s D 5 / A N a t k h U = < / l a t e x i t > Figure 1 : updating scheme for central difference method Apparently, the finite difference method (2.2.7) for updating u n+1 on a grid point relies on the previous time steps' solutions, supported on the grid point and its neighbours. The scheme (2.2.7) updates u n+1 i,j using four points of u n values (see Figure 1 ). Similarly, the finite element method (FEM) approximates the new solution by calculating the corresponded mesh cell coefficient (see Appendix), which is updated by its related nearby coefficients on the mesh. From this perspective, one may regard the numerical schemes for solving time-dependent PDEs as methods catching the information from neighbourhood data of interest.

3.1. MATHEMATICAL MOTIVATION

Recurrent neural network including LSTM is an artificial neural network structure of the form (Lipton et al., 2015) : h t = σ(W hx x t + W hh h t-1 + b h ) ≡ σ a (x t , h t-1 ) ≡ σ b (x 0 , x 1 , x 2 , • • • , x t ) , (3.1.1) where x t ∈ R d is the input data of the t th state and h t-1 ∈ R h denotes the processed value in its previous state by the hidden layers. The output y t of the current state is updated by the current state value h t : With proper design of input and forget gate, LSTM can effectively yield a better control over the gradient flow and better preserve useful information from long-range dependencies (Graves & Schmidhuber, 2005) . Now consider a temporally continuous vector function u ∈ R n given by an ordinary differential equation with the form: du(t) dt = g(u(t)) . y t = σ(W hy h t + b y ) (3.1.2) ≡ σ c (h t ) ≡ σ d (x 0 , x 1 , x 2 , • • • , x t ) . (3.1.3) Here W hx ∈ R h×d , W hh ∈ R h×h , W hy ∈ R (3.1.4) Let u n = u(t = nδt), a forward Euler's method for solving u can be easily derived from the Taylor's theorem which gives the following first-order accurate approximation of the time derivative: du n dt = u n+1 -u n δt + O(δt) . (3.1.5) Then we have: du dt = g(u) (3.1.5) ----→ u n+1 = u n + δt g(u n ) + O(δt 2 ) → ûn+1 = f 1 ( ûn ) = f 1 • f 1 • • • • f 1 ( û0 ) n (3.1.6) Here ûn ≈ u(nδt) is the numerical approximation and Thus, what if we replace the temporal input h t-1 and x t with spatial information? A simple sketch of the upwinding method for a 1d example of u(x, t): u t + νu x = 0 (3.1.7) will be: f 1 ≡ u n + δt g(u n ) : R n → R n . u n+1 i = u n i -ν δt δx (u n i -u n i-1 ) + O(δx, δt) → ûn+1 i = f 2 (û n i-1 , ûn i ) (3.1.8) ≡ f θ f η (x i , h i-1 (u)) = f θ,η ûn 0 , ûn 1 , • • • , ûn i-1 , ûn i = v n+1 i (3.1.9) x i = ûn i , h i-1 (û) = σ(û n i-1 , h i-2 (û)) ≡ f η (û n 0 , ûn 1 , ûn 2 , • • • , ûn i-1 ). (3.1.10) Here let v n+1 i be the prediction of ûn+1 i processed by neural network. We replace the temporal previous state h t-1 with spacial grid value h i-1 and input the numerical solution ûn i ≈ u(iδx, nδt) as current state value, which indicates the neural network could be seen as a forward Euler method for equation 3.1.7 (Lu et al., 2018) . Function f 2 ≡ ûn i -ν δt δx (û n i -ûn i-1 ) : R → R and the function f θ represents the dynamics of the hidden layers in decoder with parameters θ, and f η specifies the dynamics of the LSTM layer (Hochreiter & Schmidhuber, 1997; Graves & Schmidhuber, 2005) in encoder withe parameters η. The function f θ,η simulates the dynamics of the Neural-PDE with paramaters θ and η. By applying Bidirectional neural network, all grid data are transferred and it enables LSTM to simulate the PDEs as : v n+1 i = f θ f η (h i+1 ( û), ûn i , h i-1 (û)) (3.1.11) h i+1 (û) ≡ f η (û n i+1 , ûn i+2 , ûn i+3 , • • • , ûn k ). (3.1.12) For a time-dependent PDE, if we map all our grid data into an input matrix which contains the information of δx, δt, then the neural network would regress such coefficients as constants and will learn and filter the physical rules from all the k mesh grids data as: v n+1 i = f θ,η ûn 0 , ûn 1 , ûn 2 , • • • , ûn k (3.1.13) The LSTM neural network is designed to overcome the vanishing gradient issue through hidden layers, therefore we use such recurrent structure to increase the stability of the numerical approach in deep learning. The highly nonlinear function f θ,η simulates the dynamics of updating rules for u n+1 i , which works in a way similar to a finite difference method (section 2.2) or a finite element method. In particular, we use the bidirectional LSTM (Hochreiter & Schmidhuber, 1997; Graves & Schmidhuber, 2005) to better retain the state information from data on grid points which are neighbourhoods in the mesh but far away in input matrix. 𝑣𝑣 1 𝑁𝑁+1 , 𝑣𝑣 1 𝑁𝑁+2 , … , 𝑣𝑣 1 𝑁𝑁+𝑀𝑀 � 𝑢𝑢 1 0 , � 𝑢𝑢 1 1 , … , � 𝑢𝑢 1 𝑁𝑁 � 𝑢𝑢 2 0 , � 𝑢𝑢 2 1 , … , � 𝑢𝑢 2 𝑁𝑁 � 𝑢𝑢 𝑘𝑘 0 , � 𝑢𝑢 𝑘𝑘 1 , … , � 𝑢𝑢 𝑘𝑘 𝑁𝑁 𝑣𝑣 2 𝑁𝑁+1 , 𝑣𝑣 2 𝑁𝑁+2 , … , 𝑣𝑣 2 𝑁𝑁+𝑀𝑀 𝑣𝑣 k 𝑁𝑁+1 , 𝑣𝑣 𝑘𝑘 𝑁𝑁+2 , … , 𝑣𝑣 k 𝑁𝑁+𝑀𝑀 The right frame of Figure 3 shows the overall design of the Neural-PDE. Denote the time series data at collocation points as  a N 1 , a N 2 , • • • , a N k with a N i = [û 0 i , û1 i , • • • , ûN i ] at i th M 1 , b M 2 , • • • , b M k }, where b M i = [v N +1 i , v N +2 i , • • • , v N +M i ] is the Neural-PDE prediction for the i th collocation point at time points from N + 1 to N + M . The data from time point 0 to N are the training data set. The Neural-PDE is an encoder-decoder style sequence model that first maps the input data to a low dimensional latent space that h i = ----→ LSTM(a i ) ⊕ ← ---- LSTM(a i ), (3.2.1) where ⊕ denotes concatenation and h i is the latent embedding of point a i under the environment. One then decoder, another bi-lstm with a dense layer: v i = ----→ LSTM(h i ) ⊕ ← ---- LSTM(h i ) • W, (3.2.2) where W is the learnable weight matrix in the dense layer. During training process, mean squared error (MSE) loss L is used as we typically don't know the specific form of the PDE. L = N +M t=N +1 k i=1 ||û t i -v t i || 2 , (3.2.3)

3.3. DATA INITIALIZATION AND GRID POINT RESHAPE

In order to feed data into our sequence model framework, we map the PDE solution data onto a K × N matrix, where K ∈ Z + is the dimension of the grid points and N ∈ Z + is the length of the time series data on each grid point. There is no regularization for the input order of the grid points data in the matrix because of the bi-directional structure of the Neural-PDE. For example, a 2d heat equation at some time t is reshaped into a 1d vector (See Fig. 2 ). Then the matrix is formed accordingly. For a n-dimensional time-dependent partial differential equation with K collocation points, the input and output data for t ∈ (0, T ) will be of the form: A(K, N ) =         a N 0 . . . a N . . . a N K         =         û0 0 û1 0 • • • ûn 0 • • • ûN 0 . . . . . . . . . . . . . . . . . . û0 û1 • • • ûn • • • ûN . . . . . . . . . . . . . . . . . . û0 K û1 K • • • ûn K • • • ûN K         (3.3.1) B(K, M ) =         b M 0 . . . b M . . . b M K         =         v N +1 0 v N +2 0 • • • v N +m 0 • • • v N +M 0 . . . . . . . . . . . . . . . . . . v N +1 v N +2 • • • v N +m • • • v N +M k . . . . . . . . . . . . . . . . . . v N +1 K v N +2 K • • • v N +m K • • • v N +M K         (3.3.2) Here N = T δt and each row represents the time series data at the th mesh grid, and M is the time length of the predicted data. By adding Bidirectional LSTM encoder in the Neural-PDE, it will automatically extract the information from the time series data as: B(K, M ) = P DESolver(A(K, N )) = P DESolver(a N 0 , a N 1 , • • • a N i , • • • , a N K ) (3.3.3)

4. COMPUTER EXPERIMENTS

Since the Neural-PDE is a sequence to sequence learning framework which allows one to predict within any time period by the given data. One may test the Neural-PDE using different permutations of training and predicting time periods for its efficiency, robustness and accuracy. In the following examples, the whole dataset is randomly splitted in 80% for traning and 20% for testing. We will predict the next t p ∈ [31 × δt, 40 × δt] PDE solution by using its previous t tr ∈ [0, 30 × δt] data as: B(K, 10) = P DESolver(A(K, 30)) (4.0.1)  Ω = [0, 1] × [0, 1], t ∈ [0, 1], (4.0.3) and with initial and boundary conditions: u(0.25 ≤ x ≤ 0.75, 0.25 ≤ y ≤ 0.75, t = 0) = 0.9 (4.0.4) v(0.25 ≤ x ≤ 0.75, 0.25 ≤ y ≤ 0.75, t = 0) = 0.5 (4.0.5) u(0, y, t) = u(1, y, t) = v(x, 0, t) = v(x, 1, t) = 0 (4.0.6) The invicid Burgers' equation is hard to deal with in numerical PDEs due to the discontinuities (shock waves) in the solutions. We use a upwinding finite difference scheme to create the training data and put the velocity u, v in to the input matrix. Let δx = δy = 10 -2 , δt = 10 -3 , our empirical results (see Figure 4 ) show that the Neural-PDE is able to learn the shock waves, boundary conditions and the rules of the equation, and predict u and v simultaneously with an overall MSE of 1.4018 × 10 -5 . The heat maps of exact solution and predicted solution are shown in Figure 5 .

EXAMPLE: MULTISCALE MODELING: COUPLED CAHN-HILLIARD-NAVIER-STOKES SYSTEM

Finally, let's consider the following 2d Cahn-Hilliard-Navier-Stokes system widely used for modeling complex fluids: u t + u • ∇u = -∇p + ν∆u -φ∇µ , (4.0.7) φ t + ∇ • (uφ) = M ∆µ , (4.0.8) µ = λ(-∆φ + φ η 2 (φ 2 -1)) (4.0.9) ∇ • u = 0 (4.0.10) In this complicated example we will use the following initial condition: φ(x, y, 0) = ( 1 2 -50 tanh(f 1 -0.1)) + ( 1 2 -50 tanh(f 2 -0.1)), I.C. ( .0.11) f 1 = (x + 0.12) 2 + (y) 2 , f 2 = (x -0.12) 2 + (y) 2 (4.0.12) with x ∈ [-0.5, 0.5], y ∈ [-0.5, 0.5], t ∈ [0, 1], M = 0.1, ν = 0.01 (4.0.13) This fluid system can be derived by the energetic variational approach (Forster, 2013) . The complex fluids system has the following features: the micro-structures such as the molecular configurations, the interaction between different scales and the competition between multi-phase fluids (Hyon et al., 2010) . Here u is the velocity and φ(x, y, t) ∈ [0, 1] denotes the volume fraction of one fluid phase. M is the diffusion coefficient and µ is the chemical potential of φ. Equation (4.0.10) indicates the incompressibility of the fluid. Solving such PDE system is notorious because of its high nonlinearity and multi-physical and coupled features. One may use the decoupled projection method (Guermond et al., 2006) to numerically solve it efficiently or an implicit method which however is computationally ). The graphs of each columns 1-3 and 4-6 represent the time states of t 1 , t 2 , t 3 , where 0 ≤ t 1 < t 2 < t 3 ≤ 1. expensive. Another challenge of deep learning in solving a system like this is how to process the data to improve the learning efficiency when the input matrix consists of variables such as φ ∈ [0, 1] with large magnitude value and variable of very small values such as p ∼ 10 -5 . For the Neural-PDE to better extract and learn the physical features of variables in different scales, we normalized the p data with a sigmoid function. Set δt = 5 × 10 -4 , and here the training dataset is generated by a FEM solver FreeFem++ (Hecht, 2012) using a Crank-Nicolson finite element scheme. Our Neural-PDE prediction shows that the physical features of p and φ have been successfully captured with an overall MSE: 6.1631 × 10 -7 (see Figure 7 ).

5. CONCLUSIONS

In this paper, we proposed a novel sequence recurrent deep learning framework: Neural-PDE, which is capable of intelligently filtering and learning solutions of time-dependent PDEs. One key innovation of our method is that the time marching method from the numerical PDEs is applied in the deep learning framework, and the neural network is trained to explore the accurate numerical solutions for prediction. The state-of-the-art researches have shown the promising power of deep learning in solving highdimensional nonlinear problems in engineering, biology and finance with efficiency in computation and accuracy in prediction. However, there are still unresolved issues in applying deep learning in PDEs. For instance, the stability and convergence of the numerical algorithms have been rigorously studied by applied mathematicians. Due to the high nonlinearity of the neural network system and the curse of dimensionality (Hutzenthaler et al., 2019) , theorems guiding stability and convergence of solutions predicted by the neural network are to be revealed. Lastly, it would be helpful and interesting if one can theoretically characterize a numerical scheme from the neural network coefficients and learn the forms or mechanics from the scheme and prediction. We leave these questions for further study. 



y j = jδy, 0 ≤ j ≤ N y , δy = ya-y b Ny for y ∈ [y a , y b ]. N x and N y are integers.

h×h are the matrix of weights, vectors b h , b y ∈ R h are the coefficients of bias, and σ, σ a , σ b , σ c , σ d are corresponded activation and mapping functions.

Figure 3: Neural-PDE

point. The superscript represents different time points. The Neural-PDE takes the past states {a N 1 , a N 2 , • • • , a N k } of all collocation points, and outputs the predicted future states {b

Figure 4: Neural-PDE shows ideal prediction on Burgers' equation.

Figure 5: Neural-PDE shows ideal prediction on 2d Burgers Equation.

Figure7: Predicted data by Neural-PDE (first row) and the exact data (second row) of volume fraction φ (column 1-3) and pressure p (column 4-6). The graphs of each columns 1-3 and 4-6 represent the time states of t 1 , t 2 , t 3 , where 0 ≤ t 1 < t 2 < t 3 ≤ 1.

Figure 10: δx = 0.02, δy = 0.02, δt = 10 -4 , MSE: 7.0741 × 10 -6 , the size of the test data is 10 and the test time period is 140.

Figure 11: figure (a) is the exact solution u(x, y, t = 0.15) at the final state and figure (b) is the model's prediction.Figure (c) is the error map.

Figure 11: figure (a) is the exact solution u(x, y, t = 0.15) at the final state and figure (b) is the model's prediction.Figure (c) is the error map.

Neural-PDE shows very small test MSE on 4 different PDEs.

Neural-PDE outperforms baseline in test MSE on 1d Allen-Cahn and Burgers equations.

summaries the experimental results of the Neural-PDE model on 4 different PDEs, which achieve extremely small MSEs from ∼ 10 -5 to ∼ 10 -7 . Table2shows the comparison results of our proposed Neural-PDE with the state-of-the-art method Physically Informed Artificial Neural Networks (PINN)(Raissi et al., 2019) on two PDEs (1d Allen-Cahn and 1d Burgers' equation). Neural-PDE is able to outperform PINN while having much less parameters, where PINN contains 4 hidden layers with 200 neurons per layer and Neural-PDE only consists of 3 layers (2 bi-lstm with 20 neurons per layer and 1 dense output layer with 10 neurons).

A APPENDIX

A.1 FINITE ELEMENT METHOD Finite element method (FEM) is a powerful numerical method in solving PDEs. Consider a 1D wave equation of u(x, t):2) The function u is approximated by a function u h : .1.4 ) where ψ i ∈ V , is the basis functions of some FEM space V , and a n i denotes the coefficients. N denotes the degrees of freedom. Multiply the equation with an arbitrary test function ψ j and integral over the whole domain we have:Here M is the mass matrix and A is the stiffness matrix, a = (a 1 , .., a N ) t is a N × 1 vector of the coefficients at time t. The central difference method for time discretization indicates that (Johnson, 2012) :Long Short-Term Memory Networks (LSTM) (Hochreiter & Schmidhuber, 1997; Graves & Schmidhuber, 2005) are a class of artificial recurrent neural network (RNN) architecture that is commonly used for processing sequence data and can overcome the gradient vanishing issue in RNN. Similar to most RNNs (Mikolov et al., 2011) , LSTM takes a sequence {x 1 , x 2 , • • • , x t } as input and learns hidden vectors {h 1 , h 2 , • • • , h t } for each corresponding input. In order to better retain long distance information, LSTM cells are specifically designed to update the hidden vectors. The computation process of the forward pass for each LSTM cell is defined as follows:where σ is the logistic sigmoid function, Ws are weight matrices, bs are bias vectors, and subscripts i, f , o and c denote the input gate, forget gate, output gate and cell vectors respectively, all of which have the same size as hidden vector h. This LSTM structure is used in the paper to simulate the numerical solutions of partial differential equations.

A.3.1 WAVE EQUATION

Consider the 1d wave equation:Let c = 1 16π 2 and use the analytical solution given by the characteristics for the training and testing data: 

A.3.2 HEAT EQUATION

The heat equation describes how the motion or diffusion of a heat flow evolves over time. The Black-Scholes model (Black & Scholes, 1973) is also developed based on the physical laws behind the heat equation. Rather than the 1D case that maps the data into a matrix (??) with its original spatial locations, the high dimensional PDEs grids are mapped into matrix without regularization of the position, and the experimental results show that Neural-PDE is able to capture the valuable features regardless of the order of the mesh grids in the matrix. Let's start with a 2D heat equation as follows: u t = u xx + u yy , (A.3.5) u(x, y, 0) = 0.9, if (x -1) 2 + (y -1) 

