D-CIPHER: DISCOVERY OF CLOSED-FORM PARTIAL DIFFERENTIAL EQUATIONS

Abstract

Closed-form differential equations, including partial differential equations and higher-order ordinary differential equations, are one of the most important tools used by scientists to model and better understand natural phenomena. Discovering these equations directly from data is challenging because it requires modeling relationships between various derivatives that are not observed in the data (equation-data mismatch) and it involves searching across a huge space of possible equations. Current approaches make strong assumptions about the form of the equation and thus fail to discover many well-known phenomena. Moreover, many of them resolve the equation-data mismatch by estimating the derivatives, which makes them inadequate for noisy and infrequent observations. To this end, we propose D-CIPHER, which is robust to measurement artifacts and can uncover a new and very general class of differential equations. We further design a novel optimization procedure, CoLLie, to help D-CIPHER search through this class efficiently. Finally, we demonstrate empirically that it can discover many well-known equations that are beyond the capabilities of current methods.

1. INTRODUCTION

Scientists have been using mathematical equations to describe the world for centuries. In particular, closed-form differential equations turned out to be one of the best tools to model physical phenomena. A differential equation describes a relationship between a quantity and its derivatives (rates of change); it is called closed-form if this relationship is described by a mathematical expression consisting of a finite number of variables, constants, arithmetic operations, and some well-known functions (e.g., exponent, logarithm, trigonometric functions)foot_0 . Closed-form differential equations provide a general description of reality in a concise representation that is amenable to closer inspection by scientists. This renders them transparent and interpretable to human experts. Discoveries of these equations required a thorough knowledge of the theory, strong mathematical skills, substantial creativity, and good intuition. The goal of this work is to discover closed-form differential equations directly from data thus accelerating the process of scientific discovery.

Challenges in discovering differential equations from data

• Partial and higher-order derivatives. Many algorithms (Brunton et al., 2016; Qian et al., 2022) can only identify Ordinary Differential Equations (ODEs) which evolve only with respect to one variable (usually time). In contrast, many natural phenomena are described by equations involving many variables (e.g., spatial coordinates) called Partial Differential Equations (PDEs). Many equations also involve higher-order derivatives. • Derivatives not observed. Discovering differential equations from data is challenging because the derivatives are usually not observed in the dataset (equation-data mismatch (Qian et al., 2022) ). This makes verifying a candidate equation a non-trivial task. Most of the methods proposed in the literature try to resolve this issue by estimating the derivatives (Brunton et al., 2016; Rudy et al., 2017) . However, estimating the derivative is difficult, especially when the data is sampled infrequently or with high noise (Qian et al., 2022; Messenger & Bortz, 2021a ). • Strong assumptions and constrained search space. The majority of algorithms for identifying differential equations make many assumptions about the form of the equation. In particular, they make the evolution assumption (defined and explained later) and assume that the equation can be represented as a linear combination of prespecified functions and differential operators (Brunton et al., 2016; Messenger & Bortz, 2021a) . However, many well-known equations, such as a forced harmonic oscillator or an inhomogeneous wave equation, cannot be represented in that way. Currently, a few algorithms tackle only some of these challenges. In particular, Weak SINDy (Messenger & Bortz, 2021a) is able to discover PDEs without estimating the derivative by utilizing a variational approach. However, the form of the equation is constrained to be in a form amenable for a sparse regression algorithm. D-CODE (Qian et al., 2022) , on the other hand, uses a variational approach in conjunction with a symbolic regression algorithm to discover closed-form ODEs. However, it cannot handle higher-order derivatives or multiple independent variables, so it cannot be used to discover closed-form PDEs. The algorithms that do not require the evolution assumption appeared in (Mangan et al., 2016) and (Kaheman et al., 2020) but they require derivative estimation and only consider equations represented as linear combinations of prespecified functions. Contributions. In this work, we develop the Discovery of Closed-form Partial and Higher-order Differential Equations in a Robust Way framework (D-CIPHER) that does not estimate the derivatives, requires fewer assumptions, has a broader search space than previous techniques, and works for both higher-order ODEs and PDEs. Our contributions are as follows: • We examine the landscape of different types of PDEs from the discovery perspective. In particular, we introduce new notions such as evolution form, evolution assumption, derivative-bound part, and derivative-free part. We use them to describe what kinds of PDEs can be discovered with current methods and to motivate our new class of differential equations. (Section 2) • We propose a new general class of PDEs (Variational-Ready PDEs) that admit the variational formulation (and thus allows to circumvent the derivative estimation). We also prove a theorem that motivates a novel objective function. (Section 4) • We use the novel objective function to develop D-CIPHER, a new algorithm that searches over the Variational-Ready PDEs. (Section 5) • We develop a new optimization procedure (CoLLie) to efficiently solve a constrained least-squares problem and thus help D-CIPHER search through this space efficiently. (Section 6)

2. PARTIAL DIFFERENTIAL EQUATIONS

In this section, we provide background information about Partial Differential Equations and introduce new notions necessary for the following discussion. Notation and definitions. We denote the set {1, 2, . . . , n} as [n] and the set of non-negative integers as N 0 . Throughout this paper we let M, N, K ∈ N be some natural numbers and let Ω ⊂ R M be an open set inside R M . A comprehensive table with all symbols used in this work can be found in Appendix A together with some definitions restated more formally. Going beyond ODEs. The simplest differential equations are ordinary differential equations that describe quantities that evolve with respect to only one independent variable, usually time. Most methods assume that the ODE is explicit and as such can be represented as a system of first-order ODEs: uj (t) = f j (t, u(t)) where uj represents the derivative of u j . Then the discovery problem is reduced to deciding the order of the derivative (usually first or second) and the discovery of f j . For PDEs, it is not enough to talk about the derivative, as we can take derivatives with respect to different variables. We denote the mixed derivative as ∂ α , where α ∈ N M 0 is called a multi-index, and define it as ∂ α = ∂ α1 1 ∂ α2 2 . . . ∂ α M M . Each ∂ αi i = ∂ αi /∂x αi i is a α th i -order partial derivative with respect to x i (the i th independent variable)foot_1 . We define the order of α as |α| = M i=1 α i . We call ∂ α non-trivial if |α| > 0. A PDE of order K is any equation of the form f (x, u(x), ∂ [K] u(x)) = 0 ∀x ∈ Ω (2)



Detailed discussion in Appendix A.2 Throughout this work we assume that the functions we use are smooth enough for the equality of mixed partials (Spivak, 2018) to hold. In that case, any mixed derivative can be uniquely specified by a multi-index.

