LEARNING TO REASON WITH RELATIONAL ABSTRACTIONS

Abstract

Large language models have recently shown promising progress in mathematical reasoning when fine-tuned with human-generated sequences walking through a sequence of solution steps. However, the solution sequences are not formally structured and the resulting model-generated sequences may not reflect the kind of systematic reasoning we might expect an expert human to produce. In this paper, we study how to build stronger reasoning capability in language models using the idea of relational abstractions. We introduce new types of sequences that more explicitly provide an abstract characterization of the transitions through intermediate solution steps to the goal state. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy, and models that are trained to produce such sequences solve problems better than those that are trained with previously used human-generated sequences and other baselines. Our work thus takes several steps toward elucidating and improving how language models perform on tasks requiring multi-step mathematical reasoning.

1. INTRODUCTION

Deep learning has had tremendous success in a wide range of domains, such as vision (He et al., 2016) , language (Brown et al., 2020) , and playing games at superhuman levels (Mnih et al., 2015; Silver et al., 2016; Vinyals et al., 2019 ). Yet despite these accomplishments, these systems remain limited in their formal and mathematical reasoning abilities (Saxton et al., 2019; Cobbe et al., 2021; Hendrycks et al., 2021) . Although there have be recent impressive gains Lewkowycz et al. (2022) , the models remain challenged to succeed at harder problems. Recent work suggest that neural networks, like humans, benefit from relying on a chain of reasoning steps rather than attempting to produce the final output as a direct mapping from the problem prompt (Recchia, 2021; Nye et al., 2021; Hendrycks et al., 2021; Cobbe et al., 2021; Lewkowycz et al., 2022) . These works rely entirely on naturalistic data and manipulations, in the sense that problems and their step-wise solutions are taken as they are found in existing sources, or human annotators are asked to produce a sequence of solution steps using numbers interspersed with natural language. However, while naturalistic sentences are certainly how we often communicate our solutions to each other informally, we argue that formal and mathematical reasoning depends on identifying and exploiting the set of abstract relationships that underlies the details of the problem at hand. Even in settings where the focus is on the step-wise manipulation of quantities to obtain valid practical results, a set of abstract relationships underlies the sequence of operations. We build on this intuition by exploring the possibility that, if a problem-solver can formulate the problem under consideration at an abstract level, this will be conducive to finding the correct sequence of more specific arithmetic operations. However, to our knowledge, no math dataset currently exists that utilizes natural language and also isolates key reasoning components such as entities and their relations, i.e. there is no way to train the model to convert natural language inputs into these core elements. We address this gap by proposing a new dataset, GSM8K-R, by expanding on the GSM8K dataset (Cobbe et al., 2021) , a dataset containing grade-school level math word problems, with human annotations that highlight the relational abstractions that are central to mathematical reasoning. We also introduce a new synthetic task, called the unit conversion (UC) task, in which the abstract relational problem is reduced to its essence that enables controlled analyses without the complications that arise from naturalistic datasets. Math Question: Janet's ducks lay 16 eggs per day. She eats 3 for breakfast every morning and bakes muffins for her friends every day with 4. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much does she make every day? Unit The network learns to produce either the abstract relational or the numeric sequence to a task prompt, then prompted for the numeric sequence at test time. At their core, both tasks involve reasoning about how different quantities relate to each other, and formulating appropriate arithmetic equations to perform the corresponding numerical computations. We can decompose each step of the solution into abstract relational reasoning and arithmetic expressions, which can then be used to recompose the solution sequence in different forms. We summarize our main contributions as follows: • We decompose the problem solving process into identifying the relevant abstract relationships and performing the corresponding arithmetic manipulations. • We present a new dataset called GSM8K-R that adds relational abstraction annotations to the original GSM8K dataset (Cobbe et al., 2021) (to be released with the paper). • We introduce the new synthetic task Unit Conversion task that brings out the importance of engaging with the relational abstractions, even in smaller transformer models. • We find that teaching models to identify the relevant abstract relationships on trained problems can lead to substantial performance gains at test, and identify several factors affecting this outcome. • We find that identifying the crucial abstract relationships remains a challenge, and that providing the relational abstraction at test time can produce drastic gains. Taken together, we believe these findings highlight the importance of identifying the relevant abstract relations to enable correct formal and mathematical reasoning. In the discussion, we consider next steps that may allow the development of artificial systems that capture this ability.

2. INCORPORATING RELATIONAL ABSTRACTION

In this section, we describe our framework of incorporating relational abstractions into mathematical reasoning. We begin with the notion that mathematical problem solving involves determining the values of unknown quantities from known quantities, where a quantity is a numerical attribute of an item or set, such as the price of an item or the number of items in the set. Quantities can be derived from other quantities relying on rules that apply to quantities of relevant types. For example, as in the problem shown in Table 1 , the amount earned from selling some number of items (in this case, eggs) is equal to the product of the number of items sold times the price per item. In general, mathematical problem solving requires several operations on given quantities to obtain a final answer -a specified target or goal quantity. In the problem in Table 1 , we are given the number of eggs Janet's ducks lay each day, eggs eaten for breakfast, eggs used in baking, and we are told that she sells the remainder for a specified price per egg. To solve for how much money she makes, we must first determine the remainder by subtracting the number of eggs eaten and the number of eggs used in baking from the number laid, and then determine the amount earned by multiplying the remaining number of eggs times the price per egg.



Figure 1: We explore abstract relational reasoning by partitioning the reasoning process into the abstract relational and the numeric part, and compare four different possibilities: Numeric only (NN): Only numeric steps are provided without any relational tokens; Relational-first: (RRNN) The abstract relational parts are stated before the numeric; Interleaved: (RNRN): relational then numeric parts occur in alternating sequence; and Multitask: (RR|NN):The network learns to produce either the abstract relational or the numeric sequence to a task prompt, then prompted for the numeric sequence at test time.

