Decomposed Prompting : A MODULAR APPROACH FOR SOLVING COMPLEX TASKS

Abstract

Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks that can be delegated to a shared library of prompting-based LLMs dedicated to these sub-tasks. This modular structure allows each prompt to be optimized for its specific sub-task, further decomposed if necessary, and even easily replaced with more effective prompts, trained models, or symbolic functions if desired. We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting using GPT-3. On symbolic reasoning tasks, we can further decompose sub-tasks that are hard for LLMs into even simpler solvable sub-tasks. When the complexity comes from the input length, we can recursively decompose the task into the same task but with smaller inputs. We also evaluate our approach on textual multi-step reasoning tasks: on long-context multi-hop QA, we can more effectively teach the sub-tasks via our separate sub-tasks prompts; and on open-domain multi-hop QA, we can easily incorporate a symbolic information retrieval module within our decomposition framework, leading to improved performance on both tasks.

1. INTRODUCTION

Large Language Models (LLMs) such as GPT-3 (Brown et al., 2020) have been shown to solve various tasks given only a few examples as prompts, also referred to as in-context learning. These models can even perform more complex reasoning tasks when shown the sequence of simple reasoning steps needed to perform the complex task as a prompt (Wei et al., 2022; Nye et al., 2021) . In essence, the sequence of reasoning steps, such as in Chains-of-Thought (CoT) prompting (Wei et al., 2022) , demonstrates how to decompose the complex task as well as how each reasoning step should be performed. However, as tasks become more complex, few demonstrations of the complex task aren't sufficient for current models to learn to perform all necessary reasoning steps. E.g., fewshot demonstrations of concatenating the k th letter of words in a string is insufficient for GPT-3 to learn to extract the k th letter, or learn to answer hard single-hop questions when only provided a few demonstrations of multi-hop questions. Additionally, it is unclear whether tasks such as document retrieval and integration, for knowledge-intensive tasks, can even be done by few-shot prompts. To address these limitations, we propose Decomposed Prompting (DECOMP), a new approach to solve complex tasks by instead decomposing them into simpler sub-tasks and delegating these to sub-task specific LLMs, with both the decomposer and the sub-task LLMs (henceforth, sub-task handlers) having their own few-shot prompts. prompt only describes a sequence of sub-tasks (A, B, and C) needed to solve the complex tasks, indicated with the dashed lines. Each sub-task is then delegated to the corresponding sub-task handler shown on the right. Using a software engineering analogy, the decomposer defines the top-level program for the complex task using interfaces to simpler, sub-task functions. The sub-task handlers serve as modular, debuggable, and upgradable implementations of these simpler functions, akin to a software library. If a particular sub-task handler, say the one for identifying the k th letter or retrieving a document, is not performing well enough, we can debug this handler in isolation, explore alternative prompts or implementations, and seamlessly plug the improved module back into the overall system, as a systematic way to try to improve performance on the complex end-task. This approach has several advantages over prior work (as also shown in the figure). The sub-task handlers can be shown a broader and richer set of examples (of the simpler task) than the specific ones needed for the complex task prompt (task A). If a sub-task is too complex, it can be further decomposed into simpler sub-tasks (task B). Similar to software libraries, these sub-task handlers can be shared across multiple tasks; e.g., here tasks A and C are reused in the model for task B. As noted above, a sub-task handler can be easily swapped with an improved implementation without any change to the rest of the system. Few-shot prompt based LLMs can be even replaced with a symbolic system for tasks more suited for non-neural methods; e.g., task C uses a symbolic retrieval system such as Elasticsearch that can handle very large-scale corpora. Lastly, we can even improve upon prior work by simply adding an error-correcting sub-task handler as a post-processing step. To illustrate these advantages of DECOMP, we empirically evaluate it against prior work on eight challenging datasets using GPT3 models: (1) On a task of concatenating the k th letter, we show that our approach of factoring out each sub-task allows us to more effectively teach the sub-problem of extracting the k th letter(specifically, by decomposing it into even easier sub-tasks). ( 2 

2. RELATED WORK

Few-shot Prompts for Multi-Step Reasoning Large-scale Language models (LLMs) have been shown to learn various NLP tasks given just few examples as prompts (Brown et al., 2020) . Recently, they have also been successfully applied to various multi-step reasoning tasks by providing the intermediate reasoning steps, i.e. Chain-of-Thought (Wei et al., 2022; Chowdhery et al., 2022) , needed to arrive at the answer. An alternate approach has been to compose multiple LLMs or LLMs with symbolic functions to perform multi-step reasoning (Jung et al., 2022; Creswell et al., 2023;  



Figure 1: While standard approaches only provide labeled examples (shown as a grey input boxwith green label box), Chain-of-Thought prompting also describes the reasoning steps to arrive at the answer for every example in the prompt. Decomposed Prompting, on the other hand, uses the decomposer prompt to only describe the procedure to solve the complex tasks using certain subtasks. Each sub-task, indicated here with A, B and C is handled by sub-task specific handlers which can vary from a standard prompt (sub-task A), a further decomposed prompt (sub-task B) or a symbolic function such as retrieval (sub-task C)

) On a task of reversing a list, we show that DECOMP allows us to extend the capabilities of a weaker model and build a scale-invariant system by recursively decomposing the task into reversal of smaller and smaller lists. (3) On a task of long-context QA(Khot et al., 2022), our approach allows each subtask handler to accommodate more examples than feasible with CoT prompting leading to better QA performance. (4) On three multi-hop open-domain QA datasets(Yang et al., 2018; Ho et al., 2020;  Trivedi et al., 2022), we can incorporate a symbolic retrieval (ElasticSearch) API as the handler for the retrieval sub-task leading to better results than CoT. (5) On two Math QA datasets(Cobbe et al.,  2021; Roy & Roth, 2015), we can post-process CoT to easily fix frequent formatting errors, resulting in a surprisingly high improvement of 14-17 pts.

