LEARNING TO IMPROVE CODE EFFICIENCY

Abstract

Improvements in the performance of computing systems, driven by Moore's Law, have transformed society. As such hardware-driven gains slow down, it becomes even more important for software developers to focus on performance and efficiency during development. While several studies have demonstrated the potential from such improved code efficiency (e.g., 2x better generational improvements compared to hardware), unlocking these gains in practice has been challenging. Reasoning about algorithmic complexity and the interaction of coding patterns on hardware can be challenging for the average programmer, especially when combined with pragmatic constraints around development velocity and multi-person development. This paper seeks to address this problem. We analyze a large competitive programming dataset from the Google Code Jam competition (Google Code-Jam) and find that efficient code is indeed rare, with a 2x runtime difference between the median and the 90th percentile of solutions. We propose using machine learning to automatically provide prescriptive feedback in the form of hints, to guide programmers towards writing high-performance code. To automatically learn these hints from the dataset, we propose a novel discrete variational auto-encoder, where each discrete latent variable represents a different learned category of code-edit that increases performance. We show that this method represents the multi-modal space of code efficiency edits better than a sequence-to-sequence baseline and generates a distribution of more efficient solutions.

1. INTRODUCTION

The computational efficiency of code is often front-and-center in any computer science curriculum. While there are many ways to solve a particular problem, there is often wide variance in the runtime of different implementations. This variance is often attributed to many different factors: the algorithmic complexity of the code in question, the data structures that are used, the libraries that are called, and lower-level execution effects like efficient caching or memory usage. Similarly, computational efficiency is a critical component of professional software development. The computing industry as a whole has relied on the automatic performance increases of Moore's Law to scale massive warehouse computing systems to meet the internet requirements of the world. As these automatic performance increases slow down, the burden of reducing computational cost and carbon footprint now falls on writing high-performance code (Patterson et al. (2021) ). Writing efficient code is challenging, even for experienced programmers, as it requires understanding computational complexity as well as the underlying hardware. Lower-level performance optimizations are therefore automated by compilers which automatically apply a small set of known, sound lowlevel program transformations to an already written program to increase its efficiency. However, compilers and current tooling have more difficulty identifying higher-level optimizations, such as more efficient algorithms for the same problem. So far, these types of optimizations could only be identified by humans. We hypothesize that machine learning can be used to guide humans towards such optimizations, by suggesting edits that optimize code efficiency. To study this problem, we examine a competitive programming dataset where tens of thousands of developers have submitted answers to about 180 different questions. Studying these solutions, we find wide variance in computational cost: the runtime difference between a median solution and the 90th percentile is over two-fold. The scarcity of high-performance solutions highlights the difficulty of our task. Therefore, we aim to provide prescriptive feedback to developers to guide them towards writing high-performance code. We develop a framework to apply multiple categorical transformations to a single program using a novel discrete variational autoencoder where different vectors in the latent dictionary lead to different code transformations. We find that these learned categories are often consistent (e.g., a particular latent variable may control the data structure that is used for a particular problem or a for-loop vs. a while-loop), and that by applying these transformations to the program, we can move solutions into a more efficient computational efficiency category vs. the code that the developer wrote. This paper makes the following contributions: • We frame code efficiency optimization as a generative problem. • Using the Google Code Jam competitive programming dataset (Google Code-Jam), we analyze the distribution and characteristics of high-performance solutions. We find that high-performance solutions are uncommon and consist of a combination of many distinct optimizations. We then derive a canonicalized program-edits dataset to train models to improve code efficiency. • We propose a novel discrete generative latent-variable model of program edits to model a distribution of fast programs, conditioned on their slower counterparts. We find that this model outperforms a sequence-to-sequence baseline along three different axes: correctness, efficiency, and diversity. • We qualitatively demonstrate that the learned discrete latent variables represent different edits, and that the edits that are assigned to one latent variable are generally consistent. As a side-effect, we learn an interpretable program embedding space. We believe that these results are a promising step towards automating the process of identifying and applying higher-level performance optimizations, which would fundamentally increase the capabilities of current developer tools while reducing the carbon footprint of computing.

2.1. PROBLEM FORMULATION

There are many different ways to implement a particular algorithm. For algorithms like matrix multiplication, small syntactic changes like loop reordering have a dramatic impact on execution cost (Leiserson et al. ( 2020)). From low-level hardware effects like caching and branch prediction, to higher-level code choices like data structures, termination conditions, and loops -navigating the space of implementation options is a key element of software engineering. We find many of these performance archetypes when looking at competitive programming solutions. Figure 1 shows three example programs on the left and faster versions of those programs on the right. The first example showcases using a more efficient datastructure (a heap), which then enables early termination of the main loop. The second example highlights a performance bug, where fewer API calls can accomplish the same task. The third example highlights how using built-in libraries can be faster than writing bespoke implementations. These examples represent just three of the many discrete design choices that developers make while coding their solutions. We hypothesize that these discrete choices can be learned, such that a generative model can suggest different code transformations that a developer could leverage to increase code efficiency.

2.2. PROGRAM EDIT DATASET

To study this problem, we use the dataset from the Google Code Jam international competitive programming competition (Google Code-Jam). Each question consists of a problem description, along with up to three test cases -inputs and desired outputs -of increasing complexity. For each question, the dataset contains solutions from competition participants. If the submission passes each test, it is labeled as correct and annotated with run-time, otherwise it is marked as incorrect. For our study, we focus on the solutions that are written in Python. As we aim to study execution complexity, we focus only on correct submissions and consider the run-time of the largest test case. This distribution of run-times is shown in Figure 2 . Even for this constrained competitive programming task (with a natural focus on efficient solutions), Figure 2 (a) illustrates a wide distribution in run-time -supporting our hypothesis that writing efficient

