LEARNING TO REASON IN LARGE THEORIES WITHOUT IMITATION

Abstract

In this paper, we demonstrate how to do automated higher-order logic theorem proving in the presence of a large knowledge base of potential premises without learning from human proofs. We augment the exploration of premises based on a simple tf-idf (term frequency-inverse document frequency) based lookup in a deep reinforcement learning scenario. Our experiments show that our theorem prover trained with this exploration mechanism but no human proofs, dubbed DeepHOL Zero, outperforms provers that are trained only on human proofs. It approaches the performance of a prover trained by a combination of imitation and reinforcement learning. We perform multiple experiments to understand the importance of the underlying assumptions that make our exploration approach work, thus explaining our design choices.

1. INTRODUCTION

Theorem proving is a challenging benchmark for automated reasoning, and is an important milestone on the road to demonstrating that machine learning can produce a deep understanding of abstract concepts. In the long run, automated mathematical reasoning may become an important tool in engineering and scientific discovery. Due to their success in many other areas, neural networks have recently been considered as a way to guide theorem proving (Alemi et al., 2016; Loos et al., 2017; Huang et al., 2019; Bansal et al., 2019; Paliwal et al., 2020) and demonstrate approximate mathematical reasoning abilities in latent space (Lee et al., 2020) . While there is only a relatively small number of fundamental proof rules (or proof tactics) applicable at any point in a proof, there is a very large number of premises (i.e., previously proven theorems and lemmas) that could be invoked. The largest formalized libraries have over tens of thousands of theorems that can be used as premises. Thus, the main problem of reasoning in large theories is to identify the premises relevant in the current context and thereby reduce the branching factor of the proof search to a manageable size. This problem will become even more pronounced over time, as the theorem provers become more powerful, growing the number of available premises. Previous works have relied on human proofs to either directly provide or learn (Bansal et al., 2019; Paliwal et al., 2020) which premises are relevant to the current proof. However, any open-ended system for mathematical reasoning needs to be able to learn which premises are relevant without human guidance. In this work, we thus consider the problem of training a theorem prover without access to human proofs. In particular, the contributions of this work are: 1. We demonstrate training the theorem prover without human data can succeed when using deep reinforcement learning. We do this with minimal additional engineering: by augmenting exploration of premises with a portion of the premises selected by a tf-idf (Manning et al., 2008) metric. 2. We provide a first side-by-side comparison of the effect of availability of human proofs on the final theorem proving performance. We learn to prove more theorems than the prover trained on human proofs alone and almost as many as with the combination of both approaches. 3. We establish the underlying properties of the proof assistant and reinforcement learning setup that makes our approach work, by running multiple ablation experiments. We thereby solve one of the road blocks on the way to open-ended learning of mathematical reasoning in large theories.

2. RELATED WORK

Reinforcement learning (RL) without imitation learning has been successful for computer games (cf. Mnih et al. ( 2013)) and it was demonstrated later in Silver et al. ( 2017) that imitation learning is not necessary for complex games like Chess and Go. For more complex games with much larger action spaces, learning methods still rely on human imitation due to the exploration problem (cf. Vinyals et al. ( 2019)). The question of exploration is well studied in reinforcement learning (Houthooft et al., 2016; Burda et al., 2019) , but existing approaches such as -greedy do not work for premise selection because of very large (practically infinite) action space. We work in the setting of automating higher-order logic interactive theorem provers, since this is where there is most promise for building and formalizing large theories. This is also evidenced by the fact that all large-scale formalization efforts by mathematicians have occurred in such systems (Gonthier, 2008; Hales et al., 2017) . Several works have explored RL for proof search in the context of connection provers (Färber et al., 2017; Kaliszyk et al., 2018; Zombori et al., 2019; 2020) . We are instead interested in addressing the issue of premise selection from a large knowledge base, through the use of deep reinforcement learning and without use of human proofs. This is the hard part of exploration due to the large repository of premises. Premise selection itself has been an active research topic in the domain of automated theorem proving (Alama et al., 2014; Kaliszyk and Urban, 2015; Blanchette et al., 2016; Wang et al., 2017) 2016), but these approaches have relied on human proofs. In our work, we use deep RL to learn premise selection while removing this dependence on human proofs. We also provide a clear comparison of the effect of availability of human proofs to final theorem proving performance, which has been lacking in the literature. We use the HOList environment (Bansal et al., 2019) for HOL Light (Harrison, 1996) . Other ML environments for proof assistants include GamePad (Huang et al., 2019) and CoqGym (Yang and Deng, 2019) for Coq; and TacticToe (Gauthier et al., 2017) for HOL4 (Slind and Norrish, 2008) .

3. BACKGROUND

Theorem proving. Proof assistants have been built to enable humans to write and then automatically check proofs. In contrast to mathematical textbooks and papers, which are written mostly in natural language, we call mathematics formalized in proof assistants to be formal mathematics. In this work we focus on the proof assistant HOL Light (Harrison, 1996) , in which a wide range of mathematical theories have been formalized, and which has been famously used for the formalization of the Kepler conjecture (Hales et al., 2017) . HOL Light, as in many other proof assistants, relies mostly on "backward" proof steps. In contrast to "forward" proof steps, in which we only manipulate already proven statements, backward proofs start with a proof goal (the statement of the theorem to be proven) and apply proof tactics until all goals are proven. In Figure 1 , we give an example of a backward proof. The goal here is to prove x + 0 = x, for all x ∈ N, and we apply the tactic MATCH_MP_TAC to the goal. Like many tactics, this tactic takes a 



Figure 1: Formally proving ∀x ∈ N : x + 0 = x.

. Gauthier et al. (2017) uses a tf-idf based premise selection model, but does not learn a model. Urban et al. (2008); Kaliszyk et al. (2014); Kaliszyk and Urban (2014); Piotrowski and Urban (2018) interleave runs of an automated theorem prover and a premise selection model using non-deep RL approaches. Deep learning has since significantly improved the state of the art for premise selection, starting with Alemi et al. (

