RECURSION OF THOUGHT: DIVIDE AND CONQUER REASONING WITH LANGUAGE MODELS

Abstract

With the recent advances in language models, attempts are being made to apply them to solving multi-step reasoning problems. A major breakthrough in this line of research is to let language models generate intermediate steps, often called Chain of Thought (CoT), before producing a final answer. However, language models have an upper bound on the context size, i.e., the number of input tokens, such as 2048 for the recent GPT-3 and PaLM. Although several thousand tokens are enough to handle various tasks, solving more complex reasoning tasks can require orders of magnitude more tokens. Therefore, the context limit imposes a fundamental limit on the model's reasoning capability. Inspired by human's incredible reasoning ability based on abstraction and recursion, we propose Recursion of Thought (RoT) as a model-agnostic framework with the novel paradigm of teaching a language model to divide and conquer complex problems by recursively creating multiple contexts. Since RoT casts the context-related operations as tokens, a language model can trigger the recursion operations by simply producing the corresponding tokens. On multiple arithmetic and algorithmic reasoning tasks, we demonstrate that RoT dramatically improves the recent large-scale language model GPT-3 to solve extremely complex problems. Moreover, RoT can make tiny, randomly initialized Transformers or LSTMs to solve problems that even humans find daunting.

1. INTRODUCTION

Recently, language models (LMs) have become a prominant direction to solve reasoning. Given a question sequence, the models are tasked to predict the following answer sequence. One recent line of research for reasoning with LMs is chain of thought (CoT) generation (Nye et al., 2021; Wei et al., 2022; Kojima et al., 2022; Lewkowycz et al., 2022) . In CoT generation, complex reasoning problems are solved by generating intermediate reasoning steps, or a chain of thought, before producing the final answer. Directly answering a question would require a model to fully solve the problem in a single forward pass, meaning the range of solvable problems is severely limited by the model's capacity. On the other hand, generating CoT before the answer allows the problem's complexity to be spread across the CoT, making each token generation more straightforward given the previous tokens. This is closer to how humans solve complex problems, as we think step by step, instead of producing an answer reflexively. Although CoT seems promising, there is a critical issue that significantly limits its utility: the effective context size of sequence models cannot grow unbounded. In this work, context refers to the set of input tokens that a model is conditioned on when generating output. Practically, all sequence models have a limit on the maximum context length due to various reasons. For instance, Transformers (Vaswani et al., 2017) suffer from a quadratic computational cost on the context length, and RNNs (Hochreiter & Schmidhuber, 1997) struggle with long-term dependency modeling. Therefore, even the state-of-the-art language models, such as GPT-3 (Brown et al., 2020) and PaLM (Chowdhery et al., 2022) , limit the maximum context length by up to 2048 tokens. However, the length of intermediate steps can grow rapidly with the problem's complexity and exceeds the context limit. Since CoT can handle a problem only if the process of solving it fits into a single context, the range of problems that CoT can handle is severely constrained by the context limit. This issue must be

