LEARNING SYMBOLIC MODELS FOR GRAPH-STRUCTURED PHYSICAL MECHANISM

Abstract

Graph-structured physical mechanisms are ubiquitous in real-world scenarios, thus revealing underneath formulas is of great importance for scientific discovery. However, classical symbolic regression methods fail on this task since they can only handle input-output pairs that are not graph-structured. In this paper, we propose a new approach that generalizes symbolic regression to graph-structured physical mechanisms. The essence of our method is to model the formula skeleton with a message-passing flow, which helps transform the discovery of the skeleton into the search for the message-passing flow. Such a transformation guarantees that we are able to search a message-passing flow, which is efficient and Paretooptimal in terms of both accuracy and simplicity. Subsequently, the underneath formulas can be identified by interpreting component functions of the searched message-passing flow, reusing classical symbolic regression methods. We conduct extensive experiments on datasets from different physical domains, including mechanics, electricity, and thermology, and on real-world datasets of pedestrian dynamics without ground-truth formulas. The experimental results not only verify the rationale of our design but also demonstrate that the proposed method can automatically learn precise and interpretable formulas for graph-structured physical mechanisms.

1. INTRODUCTION

For centuries, the development of the natural sciences has been based on human intuition to abstract physical mechanisms represented by symbolic models, i.e., mathematical formulas, from experimental data recording the phenomena of nature. Among these developments, many mechanisms are naturally graph-structured (Leech, 1966) , where the physical quantities are associated with individual objects (e.g., mass), pair-wise relationships (e.g., force) and the whole system (e.g., overall energy), corresponding to three types of variables on graphs: node/edge/global variables. For example, as shown in Figure 1 (a), the mechanical interaction mechanism in multi-body problem corresponds to a graph with masses (m i ), positions ( ⃗ V i ) as attributes of nodes, and spring constants (k ij ) as attributes of edges, which, together with the graph connectivity, yields the acceleration as output attributes of nodes; while in the case of resistor circuit, nodes and edges correspond to voltages and resistances, respectively, and these attributes define a graph-level overall power of the circuit. In the past few years, Symbolic Regression (SR) (Sahoo et al., 2018; Schmidt & Lipson, 2009; Udrescu et al., 2020) , which searches symbolic models y = F(x) from experimentally obtained input-output pairs {(x, y)} with F being an explicit formula, has become a promising approach trying to automate scientific discovery. Traditional SR methods include genetic programming-based methods (Schmidt & Lipson, 2009; Fortin et al., 2012) working by generating candidate formulas by "evolution" (i.e., manipulations), and deep learning-based methods (Li et al., 2019; Biggio et al., 2021; Zheng et al., 2021) utilizing sequence models to generate candidate formulas. However, these methods are designed for traditional SR problems on input-output pairs {(x, y)} without considering graph information. To exploit the inherent graph structure in physical mechanisms, as shown in Figure 1 (b), SR on graphs aims to find a formula F that characterizes a mapping from input {G, X} to output y, with X and y both inside graph structure G. To perform this, we need both fine exploitation of inherent 1

