AUTOMATIC CHAIN OF THOUGHT PROMPTING IN LARGE LANGUAGE MODELS

Abstract

Large Language Models (LLMs) can carry out complex reasoning tasks by generating intermediate reasoning steps. These steps are triggered by what is called chain-of-thought (CoT) prompting, which comes in two flavors: one leverages a simple prompt like "Let's think step by step" to facilitate step-by-step reasoning before answering a question (Zero-Shot-CoT). The other uses manual demonstrations, each composed of a question and a reasoning chain that leads to an answer (Manual-CoT). Unfortunately, the superior performance of the latter strategy crucially hinges on manually generating task-specific demonstrations. This makes it far less scalable and more dependent on the talent of the CoT engineer. We show that such manual efforts may be eliminated by leveraging LLMs to generate the reasoning chains on its own. Since these generated chains often come with mistakes we propose a number of mitigation strategies. Our proposed Auto-CoT method automaticaly samples diverse questions and we perform postprocessing quality control to generate usable reasoning chains from Zero-Shot-CoT. On ten public benchmark reasoning tasks, Auto-CoT performs on par with Manual-CoT without the need for human intervention.

1. INTRODUCTION

Large language models (LLMs) (Brown et al., 2020; Thoppilan et al., 2022; Rae et al., 2021; Chowdhery et al., 2022) have performed impressively on complex reasoning tasks by decomposing multi-step problems into intermediate steps before giving answers (Nye et al., 2022) . This reasoning process is elicited by a recent technique: chain-of-thought (CoT) prompting (Wei et al., 2022b) . CoT prompting comes in two major flavors: one is to add a single prompt such as "Let's think step by step" after the test question to facilitate the reasoning chains in LLMs (Kojima et al., 2022) . Since this strategy is task-agnostic and does not need input-output demonstrations, it is called Zero-Shot-CoT (Figure 1 left). Via Zero-Shot-CoT, LLMs have shown to be decent zero-shot reasoners. The other strategy is to provide few-shot prompting through manual reasoning demonstrations one by one (Wei et al., 2022b) . Each demonstration has a question and a reasoning chain. The latter is composed of a rationale (a series of intermediate reasoning steps) and an expected answer. With all the demonstrations being manually designed, this is referred to as Manual-CoT (Figure 1 

right).

In practice, Manual-CoT outperforms Zero-Shot- CoT (Wei et al., 2022b; Kojima et al., 2022) . However, superior performance hinges on the hand-crafting of effective demonstrations. This involves nontrivial efforts in designs of both questions and their reasoning chains for demonstrations. Even more problematic, different tasks, such as arithmetic (Roy & Roth, 2015) and commonsense reasoning (Talmor et al., 2019) , require different ways of demonstrations to be manually generated. We propose Auto-CoT. It addresses the problems in Manual-CoT by automatically constructing demonstrations with questions and reasoning chains. Auto-CoT uses LLMs for this task. It generates examples using the prompt "Let's think step by step" with Zero-Shot-CoT. Unfortunately, a naive approach is insufficient. For example, given a test question of a dataset, retrieving semantically

