FORMAL MATHEMATICS STATEMENT CURRICULUM LEARNING

Abstract

We explore the use of expert iteration in the context of language modeling applied to formal mathematics. We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only. We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems, without the need for associated ground-truth proofs. Finally, by applying this expert iteration to a manually curated set of problem statements, we surpass previous state-of-the-art on the miniF2F benchmark, automatically solving multiple challenging problems drawn from high school olympiads.

1. INTRODUCTION

Deep learning has enjoyed spectacular success in many domains, including language (Brown et al., 2020; Devlin et al., 2019; Wu et al., 2016 ), vision (Radford et al., 2021; Tan & Le, 2019) , and image generation (Ramesh et al., 2021; Karras et al., 2019) . One domain where deep learning has not yet enjoyed a comparable success is in tasks that require extensive planning and symbolic reasoning, with the exception of two-player games (Silver et al., 2016; 2017; Berner et al., 2019; Vinyals et al., 2019) . In such games, deep learning systems exhibit a considerable degree of reasoning, especially when trained with self-play combined with a search procedure such as Monte Carlo Tree Search (MCTS) (Browne et al., 2012) . But the resulting reasoning abilities achieved are limited due to the relatively narrow scope of games. As such, theorem proving in interactive proof assistants, or formal mathematics, appears as an interesting game-like domain to tackle due to its increased scope. The typical tasks consist of generating a machine-checkable proof given a formal statements. Like games, formal mathematics has an automated way of determining whether a trajectory (i.e. a proof) is successful (i.e. formally correct). But the vast scope of formal mathematics means that any strong reasoning result obtained in it will be more meaningful than comparable results in games (e.g. finding proofs to mathematical conjectures), and could even be applicable to important practical problems (e.g. software verification). However, tackling formal mathematics involves two main challenges that we must address in order to continue making progress: Infinite action space Not only does formal mathematics have an extremely large search space (like Go (Silver et al., 2016) for example), it also has an infinite action space. At each step of proof search, the model must choose not from a well-behaved finite set of actions, but a complex and infinite set of tactics, potentially involving exogenous mathematical terms that have to be generated (e.g., generating a mathematical statement to be used as a witness, an object used steps such as "there exists an x ...", or a cut, the introduction and the chaining of a lemma in the middle of a proof). No direct self-play setup In formal mathematics, a prover is not playing against an opponent but against a set of statements to prove. When faced with a statement that is just too hard, there is no † Work performed while at OpenAI.

