Learning advanced mathematical computations from examples

Abstract

Using transformers over large generated datasets, we train models to learn mathematical properties of differential systems, such as local stability, behavior at infinity and controllability. We achieve near perfect prediction of qualitative characteristics, and good approximations of numerical features of the system. This demonstrates that neural networks can learn to perform complex computations, grounded in advanced theory, from examples, without built-in mathematical knowledge.

1. Introduction

Scientists solve problems of mathematics by applying rules and computational methods to the data at hand. These rules are derived from theory, they are taught in schools or implemented in software libraries, and guarantee that a correct solution will be found. Over time, mathematicians have developed a rich set of computational tools that can be applied to many problems, and have been said to be "unreasonably effective" (Wigner, 1960) . Deep learning, on the other hand, learns from examples and solves problems by improving a random initial solution, without relying on domain-related theory and computational rules. Deep networks have proven to be extremely efficient for a large number of tasks, but struggle on relatively simple, rule-driven arithmetic problems (Saxton et al., 2019; Trask et al., 2018; Zaremba and Sutskever, 2014) . Yet, recent studies show that deep learning models can learn complex rules from examples. In natural language processing, models learn to output grammatically correct sentences without prior knowledge of grammar and syntax (Radford et al., 2019) , or to automatically map one language into another (Bahdanau et al., 2014; Sutskever et al., 2014) . In mathematics, deep learning models have been trained to perform logical inference (Evans et al., 2018 ), SAT solving (Selsam et al., 2018) or basic arithmetic (Kaiser and Sutskever, 2015) . Lample and Charton (2020) showed that transformers can be trained from generated data to perform symbol manipulation tasks, such as function integration and finding formal solutions of ordinary differential equations. In this paper, we investigate the use of deep learning models for complex mathematical tasks involving both symbolic and numerical computations. We show that models can predict the qualitative and quantitative properties of mathematical objects, without built-in mathematical knowledge. We consider three advanced problems of mathematics: the local stability and controllability of differential systems, and the existence and behavior at infinity of solutions of partial differential equations. All three problems have been widely researched and have many applications outside of pure mathematics. They have known solutions that rely on advanced symbolic and computational techniques, from formal differentiation, Fourier transform, algebraic full-rank conditions, to function evaluation, matrix inversion, and computation of complex eigenvalues. We find that neural networks can solve these problems with a very high accuracy, by simply looking at instances of problems and their solutions, while being totally unaware of the underlying theory. In one of the quantitative problems where several solutions are possible (predicting control feedback matrix), neural networks are even able to predict different solutions that those generated with the mathematical algorithms we used for training. After reviewing prior applications of deep learning to related areas we introduce the three problems we consider, describe how we generate datasets, and detail how we train our models. Finally, we present our experiments and discuss their results.

2. Related work

Applications of neural networks to differential equations have mainly focused on two themes: numerical approximation and formal resolution. Whereas most differential systems and partial differential equations cannot be solved explicitly, their solutions can be approximated numerically, and neural networks have been used for this purpose (Lagaris et al., 1998; 2000; Lee and Kang, 1990; Rudd, 2013; Sirignano and Spiliopoulos, 2018) . This approach relies on the universal approximation theorem, that states that any continuous function can be approximated by a neural network with one hidden layer over a wide range of activation functions (Cybenko, 1989; Hornik et al., 1990; Hornik, 1991; Petersen and Voigtlaender, 2018; Pinkus, 1999) . This has proven to be especially efficient for high dimensional problems. For formal resolution, Lample and Charton (2020) proposed several approaches to generate arbitrarily large datasets of functions with their integrals, and ordinary differential equations with their solutions. They found that a transformer model (Vaswani et al., 2017) trained on millions of examples could outperform state-of-the-art symbolic frameworks such as Mathematica or MATLAB (Wolfram-Research, 2019; MathWorks, 2019) on a particular subset of equations. Their model was used to guess solutions, while verification (arguably a simpler task) was left to a symbolic framework (Meurer et al., 2017) . Arabshahi et al. (2018a; b) proposed to use neural networks to verify the solutions of differential equations, and found that Tree-LSTMs (Tai et al., 2015) were better than sequential LSTMs (Hochreiter and Schmidhuber, 1997) at generalizing beyond the training distribution. Other approaches investigated the capacity of neural networks to perform arithmetic operations (Kaiser and Sutskever, 2015; Saxton et al., 2019; Trask et al., 2018) or to run short computer programs (Zaremba and Sutskever, 2014) . More recently, Saxton et al. (2019) found that neural networks were good at solving arithmetic problems or at performing operations such as differentiation or polynomial expansion, but struggled on tasks like prime number decomposition or on primality tests that require a significant number of steps to compute. Unlike the questions considered here, most of those problems can be solved by simple algorithmic computations. 3 Differential systems and their stability A differential system of degree n is a system of n equations of n variables x 1 (t), ..., x n (t), dx i (t) dt = f i x 1 (t), x 2 (t), ..., x n (t) , for i = 1...n or, in vector form, with x ∈ R n and f : R n → R n , dx(t) dt = f x(t) Many problems can be set as differential systems. Special cases include n-th order ordinary differential equations (letting x 1 = y, x 2 = y , ... x n = y (n-1) ), systems of coupled differential equations, and some particular partial differential equations (separable equations or equations with characteristics). Differential systems are one of the most studied areas of mathematical sciences. They are found in physics, mechanics, chemistry, biology, and economics as well as in pure mathematics. Most differential systems have no explicit solution. Therefore, mathematicians have studied the properties of their solutions, and first and foremost their stability, a notion of paramount importance in many engineering applications.

