TOWARDS FINDING LONGER PROOFS

Abstract

We present a reinforcement learning (RL) based guidance system for automated theorem proving geared towards Finding Longer Proofs (FLoP). FLoP is a step towards learning to reason by analogy, reducing the dependence on large scale search in automated theorem provers. We use several simple, structured datasets with very long proofs to show that FLoP can successfully generalise a single training proof to a large class of related problems, implementing a simple form of analogical reasoning. On these benchmarks, FLoP is competitive with strong theorem provers despite using very limited search.

1. INTRODUCTION

Automated Theorem Proving (ATP) is the study of using machines for formal mathematical reasoning. It is related to general game playing, for example, the game of Go can be viewed as a simple formal system. Building on the recent success of machine learning, a growing trend in this field is to use learning methods to make theorem provers more powerful. Several research projects have shown that learning can be used to replace/surpass human-engineered heuristics. Despite huge improvements, interesting mathematical theorems remain elusive today. One crucial shortcoming of ATP systems is that they can typically find only relatively short proofs. In this paper, we address this shortcoming and ask the question of how machine learning can be used to solve problems requiring very long inference chains. We argue that the fundamental reason why current ATP systems are limited to short proofs is that they focus on the search aspect of the task. It is very natural to see theorem proving as a search problem: each proof step involves a choice from a set of valid inferences, yielding a search space that grows exponentially with the length of the proof. Due to the exponential blowup, the search is bound to fail beyond a certain depth -except for special classes of problems where one of the smart human heuristics of the theorem prover allows for finding the solution without a search. As W. W. Bledsoe observed (Bledsoe, 1986) : "Automated theorem proving . . . is not the beautiful process we know as mathematics. This is 'cover your eyes with blinders and hunt through a cornfield for a diamond-shaped grain of corn'." Approaches that try to avoid excessive search broadly fall into three categories: 1) Perform large steps, such as the invocation of tactics or decision procedures in SMT solvers (Barrett & Tinelli, 2018) . 2) Perform hierarchical reasoning by first creating a high-level proof plan and then gradually refine it to the calculus level, e.g. Bundy (1988) ; Melis & Siekmann (1999) . 3) Reason by analogy, e.g. Melis (1995); Brock et al. (1988) . Reasoning by analogy involves observing the proof of one problem, extracting the core idea, and successfully applying it to another. Note that using this formulation, success is barely dependent on proof length. On the other hand, establishing mappings between proofs is challenging and depends heavily on a proper data representation, which has been from the beginnings of ATP a major bottleneck for this approach. However, with the advent of machine learning methods capable of automatically discovering good data embeddings, the analogy approach seems worth revisiting. Our work aims to identify machine learning methods that are suitable for analogical reasoning, and as a result capable of solving problems with long proofs. Many successful ATP systems (Urban et al., 2008; Jakubuv & Urban, 2019; Chvalovský et al., 2019; Bansal et al., 2019a; Kaliszyk et al., 2018; Zombori et al., 2020; Olsák et al., 2020; Polu & Sutskever, 2020) 



implement the MaLARea Urban (2007); Urban et al. (2008) learning/reasoning loop (described later also as the DAgger Ross et al. (2011) meta-algorithm). The MaLARea loop interleaves ATP

