TRANSFORMER-BASED MODEL FOR SYMBOLIC RE-GRESSION VIA JOINT SUPERVISED LEARNING

Abstract

Symbolic regression (SR) is an important technique for discovering hidden mathematical expressions from observed data. Transformer-based approaches have been widely used for machine translation due to their high performance, and are recently highly expected to be used for SR. They input the data points, then output the expression skeleton, and finally optimize the coefficients. However, recent transformer-based methods for SR focus more attention on large scale training data and ignore the ill-posed problem: the lack of sufficient supervision, i.e., expressions that may be completely different have the same supervision because of their same skeleton, which makes it challenging to deal with data that may be from the same expression skeleton but with different coefficients. Therefore, we present a transformer-based model for SR with the ability to alleviate this problem. Specifically, we leverage a feature extractor based on pure residual MLP networks to obtain more information about data points. Furthermore, the core idea is that we propose a joint learning mechanism combining supervised contrastive learning, which makes features of data points from expressions with the same skeleton more similar so as to effectively alleviates the ill-posed problem. The benchmark results show that the proposed method is up to 25% higher with respect to the recovery rate of skeletons than typical transformer-based methods. Moreover, our method outperforms state-of-the-art SR methods based on reinforcement learning and genetic programming in terms of the coefficient of determination (R 2 ).

1. INTRODUCTION

Exploring mathematical expressions that can be fitted to real-world observed data is the core of expressing scientific discoveries. The correct expression would not only provide us with useful scientific insights simply by inspection but would also allow us to forecast how the process will change in the future. The task of finding such an interpretable mathematical expression from observed data is called symbolic regression. More specifically, given a dataset (X, y), where each feature X i ∈ R n and target y i ∈ R, the goal of symbolic regression is to identify a function f (i.e., y ≈ f (X) : R n → R) that best fits the dataset. Symbolic regression is NP-hard because the search space of an expression grows exponentially with the length of the expression, and the presence of numeric constants further exacerbates its difficulty (Lu et al., 2016) . Considering this issue, genetic programming (GP) as the most common approach is leveraged to tackle the symbolic regression problems (Forrest, 1993; Koza, 1994; Schmidt & Lipson, 2009; Staelens et al., 2013; Arnaldo et al., 2015; Bładek & Krawiec, 2019) . GP-based methods iteratively "evolves" each generation of mathematical expressions through selection, crossover, and mutation. Although this approach can be effective, the expression it yields

