LIME: LEARNING INDUCTIVE BIAS FOR PRIMITIVES OF MATHEMATICAL REASONING

Abstract

While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks. Here, we replace architecture engineering by encoding inductive bias in the form of datasets. Inspired by Peirce's view that deduction, induction, and abduction form an irreducible set of reasoning primitives, we design three synthetic tasks that are intended to require the model to have these three abilities. We specifically design these synthetic tasks in a way that they are devoid of mathematical knowledge to ensure that only the fundamental reasoning biases can be learned from these tasks. This defines a new pre-training methodology called "LIME" (Learning Inductive bias for Mathematical rEasoning). Models trained with LIME significantly outperform vanilla transformers on three very different large mathematical reasoning benchmarks. Unlike dominating the computation cost as traditional pre-training approaches, LIME requires only a small fraction of the computation cost of the typical downstream task.

1. INTRODUCTION

Inductive bias is essential for successful neural network learning. Many of the breakthroughs in machine learning are accompanied by new neural architectures with better inductive biases, such as locality bias in convolutional neural networks (LeCun et al., 1999) , recurrence and memory in LSTMs (Hochreiter and Schmidhuber, 1997) , and structural bias in graph neural networks (Scarselli et al., 2008) . However, existing designs of inductive biases need to be explicitly encoded in neural architecture. This is sometimes difficult as one may not know the exact mechanism for an abstract ability, in order to describe the architectural bias explicitly. In particular, designing proper inductive bias for abstract concepts such as mathematical reasoning becomes an extremely challenging task. Moreover, attempts to design elaborate architectures for reasoning often fall short of the performance of more generic transformer architecture. In this work, we aim to avoid the search for new architectures and investigate whether one can learn useful inductive bias for mathematical reasoning through pretraining. Large-scale unsupervised pretraining of language models revolutionized the field of natural language processing (NLP), improving the state-of-the-art in question answering, name entity recognition, text classification, and other domains, e.g. (Radford et al., 2018; Devlin et al., 2019; Yang et al., 2019; Liu et al., 2019; Raffel et al., 2020; Brown et al., 2020) . As a result, pretraining has become a common practice for modern neural network based NLP. One plausible explanation for the benefit of pretraining is that the model can learn world knowledge by memorizing the contents of the natural language corpus. This can be useful in various natural language downstream tasks, such as question answering and text classification. However, there is another potential advantage of pre-training-it may distill inductive biases into the model that are helpful for training on downstream tasks (Brown et al., 2020; Warstadt and Bowman, 2020) . We focus on the latter and design pre-training tasks that are intentionally devoid of knowledge and only allow the model to learn inductive bias for reasoning. Inspired by the logician Charles Peirce (Peirce, 1992), we believe that the following three primitives are the most crucial for reasoning: 1. Deduction: the ability to deduce new truths from given facts and inference rules. 2. Induction: the ability to induce general inference rules from a set of known facts. 3. Abduction: the ability to explain the relationship between the evidences and inference rules.

