REPOSITORY-LEVEL PROMPT GENERATION FOR LARGE LANGUAGE MODELS OF CODE

Abstract

With the success of large language models (LLMs) of code and their use as code assistants (e.g. Codex (Chen et al., 2021) used in GitHub Copilot 1 ), techniques for introducing domain-specific knowledge in the prompt design process become important. In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals. The prompt proposals take context from the entire repository, thereby incorporating both the structure of the repository and the context from other relevant files (e.g. imports, parent class files). Our technique doesn't require any access to the weights of the LLM, making it applicable in cases where we only have black-box access to the LLM. We conduct experiments on the task of single-line code-autocompletion using code repositories taken from Google Code archives. We demonstrate that an oracle constructed from our prompt proposals gives a remarkably high relative improvement of 36% over Codex, showing the quality of these proposals. Further, we show that when we train a model to predict a prompt proposal, we can achieve significant performance gains over Codex and other baselines.

1. INTRODUCTION

Large Language Models (LLMs) have demonstrated remarkable performance in natural language processing tasks (Brown et al., 2020; Chowdhery et al., 2022) , text-to-image generation (Ramesh et al., 2022; Rombach et al., 2021) , protein-sequencing (Rives et al., 2019) and even as a generalized agent (Reed et al., 2022) . As opposed to the pretrain-finetune paradigm, prompting these LLMs has been found to yield good performance even with few-examples (Liu et al., 2021a) . A prompt is an input to the LM such that the desired task can be expressed as predictions generated from the LM. Besides providing a mechanism to control and evaluate a LM, prompts have shown to elicit emergent behaviour as well. Examples of this behavior include GPT-3 (Brown et al., 2020) doing better in tasks it has never seen during training and improved reasoning capabilities with few-shot (Wei et al., 2022) and zero-shot (Kojima et al., 2022) prompts that encourage a chain of thoughts. These factors highlight the importance of designing an effective task-specific promptfoot_1 . However, currently we have limited understanding of how to do this (Reynolds & McDonell, 2021) . LLMs have also been used for modeling source code with impressive results (Austin et al., 2021; Fried et al., 2022; Xu et al., 2022a) . In particular, one of the best performing LLM, Codex (Chen et al., 2021), has been deployed as part of GitHub Copilotfoot_0 , a state-of-the-art in-IDE code assistant. Despite the growing popularity of LLMs of code, there is no work that systematically tackles different aspects of prompt generation in relation to source code. One such aspect is that when it comes to code, the relevant context to be put in the prompt can come from not just the current file, but also from outside, such as imports and parent classes. Also, depending on the scenario, the relevant context can be scattered across multiple locations. Since the LLMs have a limited context length available for the prompt, it becomes increasing crucial for our domain-specific understanding to guide the selection of relevant context. Currently, it is not clear how to integrate this domain knowledge of what constitutes a relevant context, into the process of creating prompts. Addressing this question has potential benefits in other domains such as question answering (Liu et al., 2022) and multi-document summarization (Xiao et al., 2022) , where domain-specific structured retrieval of context can be useful.



https://copilot.github.com/ Platforms such as PromptBase https://promptbase.com/ allow buying and selling of prompts.1

