TRANSFORMERS FOR MODELING PHYSICAL SYSTEMS

Abstract

Transformers are widely used in neural language processing due to their ability to model longer-term dependencies in text. Although these models achieve state-ofthe-art performance for many language related tasks, their applicability outside of the neural language processing field has been minimal. In this work, we propose the use of transformer models for the prediction of dynamical systems representative of physical phenomena. The use of Koopman based embeddings provide a unique and powerful method for projecting any dynamical system into a vector representation which can then be predicted by a transformer model. The proposed model is able to accurately predict various dynamical systems and outperform classical methods that are commonly used in the scientific machine learning literature. 1 2

1. INTRODUCTION

The transformer model (Vaswani et al., 2017) , built on self-attention, has largely become the stateof-the-art approach for a large set of neural language processing (NLP) tasks including language modeling, text classification, question answering, etc. Although more recent transformer work is focused on unsupervised pre-training of extremely large models (Devlin et al., 2018; Radford et al., 2019; Dai et al., 2019; Liu et al., 2019) , the original transformer model garnered attention due to its ability to out-perform other state-of-the-art methods by learning longer-term dependencies without recurrent connections. Given that the transformer model was originally developed for NLP, nearly all related work has been rightfully confined within this field with only a few exceptions. Here, we focus on the development of transformers to model dynamical systems that can replace otherwise expensive numerical solvers. In other words, we are interested in using transformers to learn the language of physics. The surrogate modeling of physical systems is a research field that has existed for several decades and is a large ongoing effort in scientific machine learning. A surrogate model is defined as an approximate model of a physical phenomenon that is designed to replace an expensive computational solver that would otherwise be needed to resolve the system of interest. The key characteristic of surrogate models is their ability to model a distribution of initial or boundary conditions rather than learning just one solution. This is arguably essential for the justification of training a deep learning model versus using a standard numerical solver. The most tangible applications of surrogates are for optimization, design and inverse problems where many repeated simulations are typically needed. With the growing interest in deep learning, deep neural networks have been used for surrogate modeling a large range of physical systems in recent literature. Standard deep neural network architectures such as auto-regressive (Mo et al., 2019; Geneva & Zabaras, 2020a) , residual/Euler (González-García et al., 1998; Sanchez-Gonzalez et al., 2020), recurrent and LSTM based models (Mo et al., 2019; Tang et al., 2020; Maulik et al., 2020) have been largely demonstrated to be effective at modeling various physical dynamics. Such models generally rely exclusively on the past time-step to provide complete information on the current state of the system's evolution. Particularly for dynamical systems, present machine learning models lack generalizable time cognisant capabilities to predict multi-time-scale phenomena present in systems including turbulent fluid flow, multi-scale materials modeling, molecular dynamics, chemical processes, etc. Thus currently adopted models struggle to maintain true physical accuracy for long-time



Code available at: [URL available after review]. Supplementary videos available at: https://sites.google.com/view/transformersphysx. 1

