ZERO-LABEL PROMPT SELECTION

Abstract

Natural language prompts have been shown to facilitate cross-task generalization for large language models. However, with no or limited labeled examples, the cross-task performance is highly sensitive to the choice of prompts, while selecting a high-performing prompt is challenging given the scarcity of labels. To address the issue, we propose a Zero-Label Prompt Selection (ZPS) method that selects prompts without any labeled data or gradient update. Specifically, given the candidate human-written prompts for a task, ZPS labels a set of unlabeled data with a prompt ensemble and uses the pseudo-labels for prompt selection. Experiments show that ZPS improves over prior methods by a sizeable margin in zero-label performance. We also extend ZPS to a few-shot setting and show its advantages over strong baselines such as prompt tuning and model tuning.

1. INTRODUCTION

Recently, extensive studies have shown that large language models (LLMs) have promising performance for few-shot learning (Brown et al., 2020; Zhao et al., 2021; Schick & Schütze, 2021; Gao et al., 2021) , and they even show strong generalization abilities to new tasks without any annotated data (Brown et al., 2020; Wei et al., 2021; Sanh et al., 2021) . Different from conventional fine-tuning methods that require expensive parameter updates for each downstream task, prompts are employed to provide in-context information or task instructions, which is helpful for guiding models to perform each task. Manually-written prompts are often used to specify the task and unify the format of inputs. However, the performance of different prompts during evaluation can vary from near state-of-the-art to random guess; e.g., using a non-optimal prompt can cause a performance drop of up to 60 points on the CB task (Zhao et al., 2021) . Previous work mainly relies on using multiple prompts (Brown et al., 2020; Wei et al., 2021; Sanh et al., 2021) or a prompt ensemble (Zhou et al., 2022) to enhance the performance and robustness when generalizing to test tasks, while omitting the fact that using multiple prompts leads to a substantially increased computational cost, which hinders the practical deployment of LLMs. These challenges make prompt selection an important problem. There have been efforts on improving model performance via searching for a better prompt. For example, Jiang et al. ( 2020) proposed two automatic methods to augment prompts. They further explored combining the generated diverse prompts with ensemble methods. Shin et al. ( 2020) designed a gradient-based search method to find trigger words in a prompt. Gao et al. (2021) developed a way to use a span-corruption pretraining objective for prompt generation. Deng et al. (2022) presented RLprompt, a prompt search method with reinforcement learning which relies on a policy network trained with a carefully designed reward function. Prasad et al. (2022) designed an iterative prompt search algorithm that relies on human-defined edit rules to improve the few-shot performance. Xu et al. (2022) proposed GPS, a genetic prompt searching algorithm that leveraged generative language models for prompt augmentation. Nevertheless, the main drawback of such methods is that they all require an additional labeled set to serve as a prompt scoring set or to provide the rewards or gradient signals. It remains challenging when no labeled samples are available. Thus, a crucial question arises: Is it possible to select a high-performing prompt without any labeled data or gradient update? In this paper, we answer this question affirmatively. To tackle the aforementioned problem, we propose ZPS-Zero Label Prompt Selection-a simple-yet-effective technique for selecting a high-

