Language Models Can Teach Themselves to Program Better

Abstract

Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems, even solving some competitive-programming problems. Self-play has proven useful in games such as Go, and thus it is natural to ask whether LMs can generate their own instructive programming problems to improve their performance. We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. The LM's performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions; thus the model "improves itself" using the Python interpreter. Problems are specified formally as programming puzzles [Schuster et al., 2021], a code-based problem format where solutions can easily be verified for correctness by execution. In experiments on publicly-available LMs, test accuracy more than doubles. This work demonstrates the potential for code LMs, with an interpreter, to generate instructive problems and improve their own performance.

1. Introduction

Recent Language Models (LMs) pre-trained for code generation [Chen et al., 2021; Chowdhery et al., 2022; Li et al., 2022; Austin et al., 2021] produce useful code and even achieve nontrivial performance in human programming competitions [Li et al., 2022] . LMs that solve programming problems may help make algorithmic breakthroughs in computer science, such as factoring large integers or designing faster algorithms for multiplying large matrices (useful in ML). However, LMs are generally trained on human-authored code which contains bugs and inefficiencies that are reproduced by LMs [Chen et al., 2021] , with ambiguous specifications usually in English or by example. Inspired by the AlphaZero's success using self-play in Go [Silver et al., 2018] , it is natural to ask whether self-play could be used for learning a programming language such as Python, by which we mean: Can an LM design its own programming problems to improve its problemsolving ability? This paper demonstrates how LMs, together with an interpreter, can be used to generate diverse datasets of verified-correct code problems and solutions, which can then be used to improve the LMs themselves through fine-tuning. These synthetic curricula are not only correct but instructive in the sense that the test performance of the LMs increases once fine-tuned on these diverse datasets of synthetic coding problems and solutions. Because programming is a universal aspect of computing, it is important (and also perhaps surprising) to discover that these LMs are capable of generating novel and instructive problems, in addition to verified-correct solutions. In addition to solution correctness, diversity is a key desideratum of synthetic problems. One could create a dataset of trillions of addition problems such as assert 173288 + 291124 == y but such a dataset would be useless outside of arithmetic. Similarly, one function f could be used to create infinite variations by renaming its variables, but this would only teach variable naming and f . One could do the same with more problems and transformations, * Work done while at Microsoft Research 1

