REPOSITORY-LEVEL PROMPT GENERATION FOR LARGE LANGUAGE MODELS OF CODE

Abstract

With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex (Chen et al., 2021) used in GitHub Copilot 1 ), techniques for introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. The prompt proposals take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our prompt proposals gives a remarkably high relative improvement of 36% over Codex, showing the quality of these proposals. Further, we show that when we train a model to predict a prompt proposal, we can achieve significant performance gains over Codex and other baselines.

1. INTRODUCTION

Large Language Models (LLMs) have demonstrated remarkable performance in natural language processing tasks (Brown et al., 2020; Chowdhery et al., 2022) , text-to-image generation (Ramesh et al., 2022; Rombach et al., 2021) , protein-sequencing (Rives et al., 2019) and even as a generalized agent (Reed et al., 2022) . As opposed to the pretrain-finetune paradigm, prompting these LLMs has been found to yield good performance even with few-examples (Liu et al., 2021a) . A prompt is an input to the LM such that the desired task can be expressed as predictions generated from the LM. Besides providing a mechanism to control and evaluate a LM, prompts have shown to elicit emergent behaviour as well. Examples of this behavior include GPT-3 (Brown et al., 2020) doing better in tasks it has never seen during training and improved reasoning capabilities with few-shot (Wei et al., 2022) and zero-shot (Kojima et al., 2022) prompts that encourage a chain of thoughts. These factors highlight the importance of designing an effective task-specific promptfoot_1 . However, currently we have limited understanding of how to do this (Reynolds & McDonell, 2021) . LLMs have also been used for modeling source code with impressive results (Austin et al., 2021; Fried et al., 2022; Xu et al., 2022a) . In particular, one of the best performing LLM, Codex (Chen et al., 2021), has been deployed as part of GitHub Copilotfoot_0 , a state-of-the-art in-IDE code assistant. Despite the growing popularity of LLMs of code, there is no work that systematically tackles different aspects of prompt generation in relation to source code. One such aspect is that when it comes to code, the relevant context to be put in the prompt can come from not just the current file, but also from outside, such as imports and parent classes. Also, depending on the scenario, the relevant context can be scattered across multiple locations. Since the LLMs have a limited context length available for the prompt, it becomes increasing crucial for our domain-specific understanding to guide the selection of relevant context. Currently, it is not clear how to integrate this domain knowledge of what constitutes a relevant context, into the process of creating prompts. Addressing this question has potential benefits in other domains such as question answering (Liu et al., 2022) and multi-document summarization (Xiao et al., 2022) , where domain-specific structured retrieval of context can be useful.



https://copilot.github.com/ Platforms such as PromptBase https://promptbase.com/ allow buying and selling of prompts. https://openai.com/blog/openai-codex/

Prompt

In this work, we address this problem by proposing Repo-Level Prompt Generator (RLPG), a framework that while generating the prompt, incorporates both the structure of the repository as well as the relevant context in all the files of the repository. In RLPG, the choice of where from and what to take from the repository is specified by a set of prompt proposals. For example, one of the prompt proposal can be to take all the identifiers used in the first import file. These prompt proposals allow the prompt engineers to induce their domain expertise in the prompt-designing process. With the increasing use of LLMs as assistive agents to humans, demand for transparency and the desire for software engineers to take active part in tailoring prompts to suit their requirements (Jiang et al., 2022; Sun et al., 2022) , this capability becomes important. As suggested in some previous works in NLP (Shin et al., 2020; Schick & Schütze, 2021) , our prompt proposals are discrete. However, rather than fixing one particular prompt proposal for each example, we instead predict the best prompt proposal conditioned on the example. We do this by coming up with a neural network called Prompt Proposal Classifier (PPC), that given an example, learns to select a prompt proposal such that the resulting prompt is likely to produce the desired output. Therefore, RLPG allows the introduction of domain expertise, and at the same time facilitates automatic example-specific prompt generation via a learned neural network. Note that there are some techniques for automatic prompt generation in NLP (Li & Liang, 2021; Shin et al., 2020; Lester et al., 2021) that require updating some or all of the weights of the LLM. However, the strongest LLMs are not publicly available (e.g. OpenAI provides access only to the generated output from Codex via an API 3 and no access to model weights and data is provided), making these techniques less useful under this scenario. RLPG addresses this limitation by generating prompts assuming only black-box access to the LLM.We focus on the task of single-line code-autocompletion in an IDE, where the objective is to predict the blanked-out portion (or target hole) starting from the position of an imagined cursor to the end of line. We operate under the line-level maintenance setting (Shrivastava et al., 2020; Hellendoorn & Devanbu, 2017 ) that reflects the scenario where a user is editing an existing file. This means that there can be code following the line. Figure 1 provides an illustration of our approach. The prompt proposal classifier takes in the hole position (position of the cursor) in the current file, the repository to which the current file belongs and a set of repo-level prompt proposals as input, and predicts a prompt proposal. In our illustrated example, the predicted prompt proposal corresponds to taking the method names and bodies from MaximizingGibbsSampler.java (mg.before the hole position indicates that a method from the imported file is likely to be invoked). The Prompt Composer uses the context from the predicted prompt proposal and combines it with the default Codex context,

