EXPLAINING PATTERNS IN DATA WITH LANGUAGE MODELS VIA INTERPRETABLE AUTOPROMPTING

Abstract

Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. In this work, we explore whether we can leverage this learned ability to find and explain patterns in data. Specifically, given a pre-trained LLM and data examples, we introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data. iPrompt iteratively alternates between generating explanations with an LLM and reranking them based on their performance when used as a prompt. Experiments on a wide range of datasets, from synthetic mathematics to natural-language understanding, show that iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions. Moreover, the prompts produced by iPrompt are simultaneously human-interpretable and highly effective for generalization: on real-world sentiment classification datasets, iPrompt produces prompts that match or even improve upon human-written prompts for GPT-3. Finally, experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery.

1. INTRODUCTION

Large language models (LLMs) have attained an extraordinary ability to harness natural language for solving diverse natural-language problems (Devlin et al., 2018) , often without the need for finetuning (Brown et al., 2020; Sanh et al., 2021) . Moreover, LLMs have demonstrated the capacity to excel at real-world problems, such as mathematics (Lewkowycz et al., 2022) and scientific question answering (Sadat & Caragea, 2022) . In this work, we probe whether we can leverage the learned skills of an LLM to find and explain patterns in a dataset. To do so, we invert the typical problem of fitting an LLM to data and instead ask whether we can use a fixed LLM to produce a natural-language string explaining dataset patterns. Our approach to this problem centers around prompting. Prompting has emerged as an effective method for adapting LLMs to perform new tasks (Liu et al., 2021a) . A prompt string is combined with each example in a dataset before querying an LLM for an answer. While prompts were initially constructed manually, recent work has shown success in autoprompting, i.e. automatically finding a prompt via optimization (Shin et al., 2020; Li & Liang, 2021) . However, previous work on learning natural language prompts Shin et al. (2020) does not produce prompts that are meaningful to humans. Our approach, interpretable autoprompting (iPrompt), extends autoprompting to generate a semantically meaningful natural-language prompt that explains a key characteristic of the data (see Fig. 3 ). For example, given a dataset of examples of addition, e.g. 2 5 ) 7 ... 3 1 ) 4, we use an LLM to yield the natural-language description Add the inputs. iPrompt is an iterative algorithm that alternates between (i) proposing candidate explanations with an LLM, (ii) reranking the candidates based on their performance when used as a prompt, and (iii) exploring new candidates. To evaluate iPrompt, we curate a diverse collection of datasets written in natural language (Table 1 ), where our goal is to accurately infer a ground-truth pattern. The dataset includes a number of synthetic math datasets, as well as language tasks from the Natural Instructions V2 dataset (Wang et al., 2022) . We find that iPrompt outperforms baseline autoprompting methods in successfully finding a correct description across these datasets. Moreover, the generated descriptions are interpretable, (Huth et al., 2016) 20 Find an underlying category from a list of words that excite an fMRI voxel Extracting a pattern from a set of words, each corresponding to a different voxel allowing human auditing and enabling strong generalization performance when used as a prompt in a new setting (i.e. when used for a different LLM). On real-world sentiment classification datasets, iPrompt even produces prompts that match or improve upon human-written prompts for GPT-3. Finally, we qualitatively explore iPrompt in a neuroscience task, in which we seek to understand the mapping of semantic concepts in the brain from fMRI imaging (data from Huth et al. ( 2016)).

2. DATASET EXPLANATION TASK

Task definition Given a dataset comprised of input-output string pairs {(x 1 , y 1 ), . . . (x N , y N )}, the goal is to produce a "semantically meaningful" natural-language string that explains the relationship between x and y. We require that a string consists of human-understandable text rather than a sequence of incongruous tokens. For example, in the task shown in Fig. 3 , the task is to recover text synonymous to Add the inputs given samples of data performing addition. Datasets Table 1 shows the four collections of datasets we study: (1) Inverse Synthetic Math with datasets that require inferring an underlying mathematical function of one or two numbers; (2) Inverse Allen NLI (ANLI), a selection of crowdsourced language tasks (Wang et al., 2022) with easily verifiable descriptions (e.g. Find a country's capital); (3) Sentiment, consisting of four real-world sentiment classification tasks and (4) fMRI, a dataset involving brain responses to natural language, motivated by the goal of recovering unknown explanations. In addition to data examples, the first two collections contain a ground-truth description and simple rules to test whether an extracted description matches the ground-truth one. For example, when adding two numbers (Fig. 3 ), the rule checks whether a description contains any of the keywords add, sum, or +. The examples in each task do not directly contain the task description. For example, when inferring the Add two numbers task, the examples do not contain a plus sign or any synonyms of the word add such as combine. For classification tasks such as Check edibility or Check prime, the label provided in the example text is simply yes/no rather than the given labels, e.g. edible/non-edible.



Figure1: Interpretable autoprompting (iPrompt) inverts the standard prediction problem to instead find a natural-language explanation of the data using a fixed, pre-trained large language model (LLM).

Dataset Explanation Task. For full details on each dataset, see Appendix A.1.

