LEARNING TO REASON IN LARGE THEORIES WITHOUT IMITATION

Abstract

In this paper, we demonstrate how to do automated higher-order logic theorem proving in the presence of a large knowledge base of potential premises without learning from human proofs. We augment the exploration of premises based on a simple tf-idf (term frequency-inverse document frequency) based lookup in a deep reinforcement learning scenario. Our experiments show that our theorem prover trained with this exploration mechanism but no human proofs, dubbed DeepHOL Zero, outperforms provers that are trained only on human proofs. It approaches the performance of a prover trained by a combination of imitation and reinforcement learning. We perform multiple experiments to understand the importance of the underlying assumptions that make our exploration approach work, thus explaining our design choices.

1. INTRODUCTION

Theorem proving is a challenging benchmark for automated reasoning, and is an important milestone on the road to demonstrating that machine learning can produce a deep understanding of abstract concepts. In the long run, automated mathematical reasoning may become an important tool in engineering and scientific discovery. Due to their success in many other areas, neural networks have recently been considered as a way to guide theorem proving (Alemi et al., 2016; Loos et al., 2017; Huang et al., 2019; Bansal et al., 2019; Paliwal et al., 2020) and demonstrate approximate mathematical reasoning abilities in latent space (Lee et al., 2020) . While there is only a relatively small number of fundamental proof rules (or proof tactics) applicable at any point in a proof, there is a very large number of premises (i.e., previously proven theorems and lemmas) that could be invoked. The largest formalized libraries have over tens of thousands of theorems that can be used as premises. Thus, the main problem of reasoning in large theories is to identify the premises relevant in the current context and thereby reduce the branching factor of the proof search to a manageable size. This problem will become even more pronounced over time, as the theorem provers become more powerful, growing the number of available premises. Previous works have relied on human proofs to either directly provide or learn (Bansal et al., 2019; Paliwal et al., 2020) which premises are relevant to the current proof. However, any open-ended system for mathematical reasoning needs to be able to learn which premises are relevant without human guidance. In this work, we thus consider the problem of training a theorem prover without access to human proofs. In particular, the contributions of this work are: 1. We demonstrate training the theorem prover without human data can succeed when using deep reinforcement learning. We do this with minimal additional engineering: by augmenting exploration of premises with a portion of the premises selected by a tf-idf (Manning et al., 2008) metric. 2. We provide a first side-by-side comparison of the effect of availability of human proofs on the final theorem proving performance. We learn to prove more theorems than the prover trained on human proofs alone and almost as many as with the combination of both approaches. 3. We establish the underlying properties of the proof assistant and reinforcement learning setup that makes our approach work, by running multiple ablation experiments. We thereby solve one of the road blocks on the way to open-ended learning of mathematical reasoning in large theories.

