LEARNING TO REASON WITH RELATIONAL ABSTRACTIONS

Abstract

Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning.

1. INTRODUCTION

Deep learning has had tremendous success in a wide range of domains, such as vision (He et al., 2016) , language (Brown et al., 2020) , and playing games at superhuman levels (Mnih et al., 2015; Silver et al., 2016; Vinyals et al., 2019 ). Yet despite these accomplishments, these systems remain limited in their formal and mathematical reasoning abilities (Saxton et al., 2019; Cobbe et al., 2021; Hendrycks et al., 2021) . Although there have be recent impressive gains Lewkowycz et al. (2022) , the models remain challenged to succeed at harder problems. Recent work suggest that neural networks, like humans, benefit from relying on a chain of reasoning steps rather than attempting to produce the final output as a direct mapping from the problem prompt (Recchia, 2021; Nye et al., 2021; Hendrycks et al., 2021; Cobbe et al., 2021; Lewkowycz et al., 2022) . These works rely entirely on naturalistic data and manipulations, in the sense that problems and their step-wise solutions are taken as they are found in existing sources, or human annotators are asked to produce a sequence of solution steps using numbers interspersed with natural language. However, while naturalistic sentences are certainly how we often communicate our solutions to each other informally, we argue that formal and mathematical reasoning depends on identifying and exploiting the set of abstract relationships that underlies the details of the problem at hand. Even in settings where the focus is on the step-wise manipulation of quantities to obtain valid practical results, a set of abstract relationships underlies the sequence of operations. We build on this intuition by exploring the possibility that, if a problem-solver can formulate the problem under consideration at an abstract level, this will be conducive to finding the correct sequence of more specific arithmetic operations. However, to our knowledge, no math dataset currently exists that utilizes natural language and also isolates key reasoning components such as entities and their relations, i.e. there is no way to train the model to convert natural language inputs into these core elements. We address this gap by proposing a new dataset, GSM8K-R, by expanding on the GSM8K dataset (Cobbe et al., 2021) , a dataset containing grade-school level math word problems, with human annotations that highlight the relational abstractions that are central to mathematical reasoning. We also introduce a new synthetic task, called the unit conversion (UC) task, in which the abstract relational problem is reduced to its essence that enables controlled analyses without the complications that arise from naturalistic datasets.

