NEURAL DAES: CONSTRAINED NEURAL NETWORKS

Abstract

In this article we investigate the effect of explicitly adding auxiliary trajectory information to neural networks for dynamical systems. We draw inspiration from the field of differential-algebraic equations and differential equations on manifolds and implement similar methods in residual neural networks. We discuss constraints through stabilization as well as projection methods, and show when to use which method based on experiments involving simulations of multi-body pendulums and molecular dynamics scenarios. Several of our methods are easy to implement in existing code and have limited impact on training performance while giving significant boosts in terms of inference.

1. INTRODUCTION

Many scientific simulations of dynamical systems have natural invariants that can be expressed by constraints. Such constraints represent conservation of some quantities of the system under study. For example, in molecular dynamics, bond lengths between atoms are assumed fixed. Another example is incompressible fluid flow, where the divergence of the velocity field vanishes at any point in space and time. Similarly, in Maxwell's equations, the divergence of the magnetic field vanishes (no magnetic charge). Such additional information about the flow can be crucial if we are to keep the simulations faithful. As a result, a wealth of techniques have been proposed to conduct simulations that obey the constraints at least approximately (Weiglhofer, 1994; Ascher & Petzold, 1998; Allen et al., 2004) . In recent years, machine learning based techniques, and in particular deep neural networks, have been taking a growing role in modelling physical phenomena. In some cases, such techniques are used as inexpensive surrogates of the true physical dynamics and in others they are used to replace it altogether (see e.g. Wang et al. (2018) ; Degiacomi (2019); Miyanawala & Jaiman (2017)). These techniques use the wealth of data, either observed or numerically generated, in order to "learn" the parameters in a neural network, so that the data are fit to some accuracy. The network is then expected to perform well on new data, outside of the training set and yield simulation results that are accurate and reliable, in many cases, at significantly less computational effort. From classical simulations, we know that it is often as important to accurately obey the additional constraint information as it is to accurately satisfy the underlying ordinary/partial differential equation (ODE/PDE) system. Nonetheless, no neural network architecture known to us is designed to honour such constraints or invariants. The hope in standard training procedures is that by fitting the data, the network will "learn" the constraints and embed them in the weights implicitly. This, however, has been demonstrated to be insufficient in many cases (Wah & Qian, 2001 ). As we show in this paper, on some very simple examples, neural networks may be able to approximately learn the dynamics but they can drift off the constraints manifold. This leads to erroneous results that violate simple underlying physical properties. The question then is: How should additional constraints information be incorporated in a neural network architecture so that this physical information is, at least approximately, honoured? The idea of adding constraints information to a network is essentially a continuation of the ongoing process of connecting mathematics with machine learning and explicitly adding known information into a neural network rather than having the network implicitly learn it (Willard et al., 2021) . Equivariant networks are one example of this, where the symmetry of a problem is explicitly built into the neural network (Thomas et al., 2018) In this paper we introduce a new paradigm in neural network architectures by developing methodologies that allow us to incorporate such additional information on a dynamical system. Motivated by the field of differential algebraic equations (DAE), we study two different approaches for the solution of the problem. The first is the incorporation of the constraints into the network by Lagrange multipliers, and the second employs so called stabilization techniques that aim to penalize outputs that grossly violate the constraints. Both approaches have similar counterparts in the physical simulation world, and in particular in systems of DAEs (Ascher & Petzold, 1998) . Our methodology is designed for residual neural networks, however it can be used and adopted for other architectures as well. We experiment with it on a number of well-known problems, focusing on molecular dynamics (MD) applications (Allen et al., 2004) , which enables us to incorporate a variety of constraints often resulting in significant performance improvement. The rest of this paper is organized as follows: Section 2 describes various ways that constraints can be introduced in neural networks. Section 3 introduces our model problems, and describes relevant constraints for those problems, while Section 4 performs a series of experiments based on those problems using constrained neural networks. The paper is wrapped up in Section 5 with a discussion and conclusions.

2. RESIDUAL NETWORKS AND CONSTRAINTS

We consider the problem of statistical learning where we have the data pairs (x i , y i ), i = 1, . . . n, and we assume that x ∈ X and y ∈ Y . Our goal is to find a function f depending on parameters θ that satisfies f (x, θ) = y. (1) We focus our attention on functions f (•, •) that are mapped by neural networks. Such functions typically contain a sequence of multiplications by "learnable" matrices followed by nonlinearities. In this work we particularly focus on the continuous form of residual network architectures that reads z 0 = K o x, = σ(z, θ(τ )), τ ∈ [0, 1], y c = K c z(1). The matrix K o typically embeds input data x ∈ X in a vector z in a larger space Z . Next, the residual network uses m layers with learnable parameter θ. Such parameters can be different weights, normalizations and attention parameters. Finally, the larger space is closed by a learnable closing matrix K c . For the class of problems we solve here, we assume that there is a given vector function c(•) such that, c(y c ) = c(K c z(1)) = 0. The ODE equation 2b with the initial condition equation 2a represent a trajectory, z(τ ) in the high dimensional space Z . This trajectory can be projected into the low dimensional, physical space, Y setting, y(τ ) = K c z(τ ) which is assumed to be constrained. Therefore, along trajectories z satisfying equation 2c c(K c z) = 0. Given the output y c one trains the network by first discretizing the ODE (typically by the forward Euler method) and then, minimizing a loss function that measures the difference between y c and y. However, even when such parameters are learned, they rarely yield zero loss, especially on the validation set. Therefore, in general, the constraints c(K c z) are not automatically fulfilled and this can yield results that are physically infeasible. Our goal is to modify the architecture given in equation 2 such that the additional information given by equation 3 is addressed. We next discuss four such approaches that can be used.

2.1. AUXILIARY REGULARIZATION

The simplest method for incorporating constraint information is to add an auxiliary regularization term to the loss function. Let L be the traditional loss. We extend this by adding a regularization term L η = L + η 2 c(y c ) ⊤ c(y c ), where the positive parameter η determines the strength of the regularization.



. Previous work on adding constraints to a neural network includes Stewart & Ermon (2017); Xu et al. (2018) which add an auxiliary regularization term to the loss function; Raissi et al. (2019) physics informed networks shape the output of the neural network to fulfill a particular partial differential equation; while Li & Srikumar (2019) incorporates first order logic directly into the neural network.

