BUSTLE: BOTTOM-UP PROGRAM SYNTHESIS THROUGH LEARNING-GUIDED EXPLORATION

Abstract

Program synthesis is challenging largely because of the difficulty of search in a large space of programs. Human programmers routinely tackle the task of writing complex programs by writing sub-programs and then analyzing their intermediate results to compose them in appropriate ways. Motivated by this intuition, we present a new synthesis approach that leverages learning to guide a bottom-up search over programs. In particular, we train a model to prioritize compositions of intermediate values during search conditioned on a given set of input-output examples. This is a powerful combination because of several emergent properties. First, in bottom-up search, intermediate programs can be executed, providing semantic information to the neural network. Second, given the concrete values from those executions, we can exploit rich features based on recent work on property signatures. Finally, bottom-up search allows the system substantial flexibility in what order to generate the solution, allowing the synthesizer to build up a program from multiple smaller sub-programs. Overall, our empirical evaluation finds that the combination of learning and bottom-up search is remarkably effective, even with simple supervised learning approaches. We demonstrate the effectiveness of our technique on two datasets, one from the SyGuS competition and one of our own creation.

1. INTRODUCTION

Program synthesis is a longstanding goal of artificial intelligence research (Manna & Waldinger, 1971; Summers, 1977) , but it remains difficult in part because of the challenges of search (Alur et al., 2013; Gulwani et al., 2017) . The objective in program synthesis is to automatically write a program given a specification of its intended behavior, and current state of the art methods typically perform some form of search over a space of possible programs. Many different search methods have been explored in the literature, both with and without learning. These include search within a version-space algebra (Gulwani, 2011), bottom-up enumerative search (Udupa et al., 2013) , stochastic search (Schkufza et al., 2013 ), genetic programming (Koza, 1994) , reducing the synthesis problem to logical satisfiability (Solar-Lezama et al., 2006) , beam search with a sequence-to-sequence neural network (Devlin et al., 2017) , learning to perform premise selection to guide search (Balog et al., 2017) , learning to prioritize grammar rules within top-down search (Lee et al., 2018) , and learned search based on partial executions (Ellis et al., 2019; Zohar & Wolf, 2018; Chen et al., 2019) . While these approaches have yielded significant progress, none of them completely capture the following important intuition: human programmers routinely write complex programs by first writing sub-programs and then analyzing their intermediate results to compose them in appropriate ways. We propose a new learning-guided system for synthesis, called BUSTLE, 1 which follows this intuition in a straightforward manner. Given a specification of a program's intended behavior (in this paper given by input-output examples), BUSTLE performs bottom-up enumerative search for a satisfying program, following Udupa et al. (2013) . Each program explored during the bottom-up search is an expression that can be executed on the inputs, so we apply a machine learning model to the resulting value to guide the search. The model is simply a classifier trained to predict whether the intermediate value produced by a partial program is part of an eventual solution. This combination of learning and bottom-up search has several key advantages. First, because the input to the model is a value produced by executing a partial program, the model's predictions can depend on semantic information about the program. Second, because the search is bottom-up, compared to previous work on execution-guided synthesis, the search procedure has more flexibility about which order to generate the program in, and this flexibility can be exploited by machine learning. A fundamental challenge in this approach is that exponentially many intermediate programs are explored during search, so the model needs to be both fast and accurate to yield wall-clock time speedups. We are allowed to incur some slowdown from performing model inference, because if the model is accurate enough, we can search many fewer values before finding solutions. However, in the domains we consider, executing a program is still orders of magnitude faster than performing inference on even a small machine learning model, so this challenge cannot be ignored. We employ two techniques to deal with this. First, we arrange both the synthesizer and the model so that we can batch model prediction across hundreds of intermediate values. Second, we process intermediate expressions using property signatures (Odena & Sutton, 2020), which featurize program inputs and outputs using another set of programs. A second challenge is that neural networks require large amounts of data to train, but there is no available data source of intermediate expressions. We can generate programs at random to train the model following previous work (Balog et al., 2017; Devlin et al., 2017) , but models trained on random programs do not always transfer to human-written benchmarks (Shin et al., 2019) . We show that our use of property signatures helps with this distribution mismatch problem as well. In summary, this paper makes the following contributions: • We present BUSTLE, which integrates machine learning into bottom-up program synthesis. • We show how to efficiently add machine learning in the synthesis loop using property signatures and batched predictions. With these techniques, adding the model to the synthesizer provides an end-to-end improvement in synthesis time. • We evaluate BUSTLE on two string transformation datasets: one of our own design and one from the SyGuS competition. We show that BUSTLE leads to improvements in synthesis time compared to a baseline synthesizer without learning, a DeepCoder-style synthesizer (Balog et al., 2017) , and an encoder-decoder model (Devlin et al., 2017) . Even though our model is trained on random programs, we show that its performance transfers to a set of human-written synthesis benchmarks.

2.1. PROGRAMMING BY EXAMPLE

In a Programming-by-Example (PBE) task (Winston, 1970; Menon et al., 2013; Gulwani, 2011) , we are given a set of input-output pairs and the goal is to find a program such that for each pair, the synthesized program generates the corresponding output when executed on the input. To restrict the search space, the programs are typically restricted to a small domain-specific language (DSL). As an example PBE specification, consider the "io_pairs" given in Listing 1.

2.2. OUR STRING TRANSFORMATION DSL

Following previous work (Gulwani, 2011; Devlin et al., 2017) , we deal with string and number transformations commonly used in spreadsheets. Such transformations sit at a nice point on the complexity spectrum as a benchmark task; they are simpler than programs in general purpose languages, but still expressive enough for many common string transformation tasks. The domain-specific language we use (shown in Figure 1 ) is broadly similar to those of Parisotto et al. (2017) and Devlin et al. ( 2017), but compared to these, our DSL is expanded in several ways that make the synthesis task more difficult. First, in addition to string manipulation, our DSL includes integers, integer arithmetic, booleans, and conditionals. Second, our DSL allows for arbitrarily nested expressions, rather than having a maximum size. Finally, and most importantly, previous works (Gulwani, 2011; Devlin et al., 2017) impose a restriction of having Concat as the top-level operation. With this constraint, such approaches use version space algebras or dynamic programming

