DO TRANSFORMERS UNDERSTAND POLYNOMIAL SIMPLIFICATION?

Abstract

Recently researchers have demonstrated that Transformers can be trained to learn symbolic tasks such as solving integration and differential equations in an end-toend fashion. In these setups, for an input symbolic expression, the Transformer predicts the final solution in a single step. Since such tasks may consist of a sequence of logical steps, question remains whether such networks have understood and learnt individual steps to reach the solution. To take a deeper look, we consider the task of polynomial simplification. Polynomials can be written in a simple normal form as a sum of monomials which are ordered in a lexicographic order. For a polynomial which is not necessarily in this normal form, a sequence of simplification steps is applied to reach the fully simplified (i.e., in the normal form) polynomial. For this task, we describe a synthetic Polynomial dataset generation algorithm which generates polynomials with unique proof steps. Then, we conduct an extensive analysis of the Transformer's abilities to learn the polynomial simplification task along different dimensions.

1. INTRODUCTION

With the state-of-the-art performance of Deep Neural Nets (DNNs) in perceptual tasks, researchers have started to explore their logical reasoning capabilities, in particular within the domain of Automated Theorem Proving (ATP). In these domains (LEAN (de Moura et al., 2015) , HOL Light and Mizar (miz, 2020)), many recent works (Paliwal et al., 2020; Aygün et al., 2020; Hahn et al., 2020) have shown that Graph Neural Networks (Gori et al., 2005; Veličković et al., 2018) and Transformers (Vaswani et al., 2017) can be trained to perform impressively on the theorem-proving task as part of a neuro-symbolic system. In a related but different development, recently Lample & Charton (2019) showed that for symbolic integration and differential equations, a large amount of synthetic end-to-end examples can be generated using symbolic systems. In these tasks, the authors show that Transformer networks can be trained to produce the final solution from an input integral (or differential equation) in a single step. This points to the exciting possibility of using deep neural nets to learn end-to-end theorem provers, and can be beneficial for formal mathematics (Szegedy, 2020). However, the setup combines multiple reasoning steps in a single shot. Additionally, integration (or differential equation solving) is a complex task requiring understanding of the integral symbols, functions, variables, and the basic concepts of arithmetic. As the system in Lample & Charton ( 2019) is simply trained to output the top solution(s) and a corresponding confidence score(s), it is unclear what internal mechanisms enable these models to solve these problems. This lack of transparency has been noted in this context (Davis, 2019 ). An earlier work by Piotrowski et al. (2019) showed similar results for certain symbolic manipulation tasks and their work shares the same limitation. In this paper we ask if instead of only producing the end-result of symbolic manipulation or integral, can we have the model produce a human-readable proof as well. While we do not know if these models reason in the way humans do, one way to produce proofs would be to "extract" a proof from the models of the above type by "probing" them in some mannner. The problem of unraveling the inner workings of Transformers by probing is an active area of research; however, at present our understanding is still evolving (Rogers et al., 2020) . Hence taking a detour, we instead train the model to produce the full proof.

