REPRESENTING PARTIAL PROGRAMS WITH BLENDED ABSTRACT SEMANTICS

Abstract

Synthesizing programs from examples requires searching over a vast, combinatorial space of possible programs. In this search process, a key challenge is representing the behavior of a partially written program before it can be executed, to judge if it is on the right track and predict where to search next. We introduce a general technique for representing partially written programs in a program synthesis engine. We take inspiration from the technique of abstract interpretation, in which an approximate execution model is used to determine if an unfinished program will eventually satisfy a goal specification. Here we learn an approximate execution model implemented as a modular neural network. By constructing compositional program representations that implicitly encode the interpretation semantics of the underlying programming language, we can represent partial programs using a flexible combination of concrete execution state and learned neural representations, using the learned approximate semantics when concrete semantics are not known (in unfinished parts of the program). We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs, such as loops and higher-order functions, and can be used to synthesize programs more accurately for a given search budget than pure neural approaches in several domains.

1. INTRODUCTION

Inductive program synthesis -the problem of inferring programs from examples -offers the promise of building machine learning systems that are interpretable, generalize quickly, and allow us automate software engineering tasks. In recent years, neurally-guided program synthesis, which uses deep learning to guide search over the space of possible programs, has emerged as a promising approach (Balog et al., 2016; Devlin et al., 2017) . In this framework, partially-constructed programs are judged to determine if they are on the right track and to predict where to search next. A key challenge in neural program synthesis is representing the behavior of partially written programs, in order to make these judgments. In this work, we present a novel method for representing the semantic content of partially written code, which can be used to guide search to solve program synthesis tasks. Consider a tower construction domain in which a hand drops blocks, Tetris-style, onto a vertical 2D scene (Figure 1 ). In this domain, a function buildColumn(n) stacks n vertically-oriented blocks at the current cursor location, and moveHand(n) moves the cursor n spaces to the right. Given an image X of a scene, our task is to write a program which builds a tower matching the image X . To do this, a model can perform search in the space of programs, iteratively adding code until the program is complete. While attempting to synthesize a program, imagine arriving at a partially-constructed program s (short for sketch), where HOLE signifies unfinished code: s = loop(4, [buildColumn(1), moveHand(<HOLE>)]) Note that this partial program cannot reach the goal state, because the target image has columns of height 2, but this program can only build columns of height 1. For an algorithm to determine if it should expand s or explore another part of the search space, it needs to determine whether s is (To see why this is helpful, consider two distinct syntactic expressions 2 + 1 and 6/2; a syntax-based model might assign them different representations, whereas a model using a semantic representation will represent both as equivalent to 3.) However, execution is not always possible for a partial program. In our running example, before the HOLE is filled with an integer value, we cannot meaningfully execute the partially-written loop in s. This is a common problem for languages containing higher-order functions and control flow, where execution of partially written code is often ill-defined.foot_0 Thus, a key question is: How might we represent the semantics of unfinished code? A classic method for representing program state, known as abstract interpretation (Cousot & Cousot, 1977) , can be used to reason about the set of states that a partial program could reach, given the possible instantiations of the unfinished parts of the program. Using abstract interpretation, an approximate execution model can determine if an unfinished program will eventually satisfy a goal specification. For example, in the tower-building domain, an abstract interpreter could be designed to track, for every horizontal location, the minimum tower height that all continuations are guaranteed to exceeded. However, this technique is often low-precision: hand-designed abstract execution models greatly overapproximate the set of possible execution states, and do not automatically adapt themselves to the strengths or weaknesses of specific search algorithms. We hypothesize that, by mimicking the compositional structure of abstract interpretation, learned components can be used to effectively represent ambiguous program state. In this work, we make two contributions: we introduce neural abstract semantics, in which a compositional, approximate execution model is used to represent partially written code. This approach can be extended to blended abstract semantics, which aims to represent the state of unfinished programs as faithfully as possible by concretely executing program components whenever possible, and otherwise, approximating program state with a learned abstract execution model.



See Peleg et al. (2020) for a discussion in the context of bottom-up synthesis.



Figure1: Schematic overview of the search procedure and representational scheme. We characterize program synthesis as a goal-conditioned search through the space of partial programs (left), and propose a novel representational scheme (blended abstract semantics) to facilitate this search process. Left: a particular trajectory through the space of partial programs, where the goal is to find a program satisfying the target image. Right: three encoding schemes for partial programs, which can each be used as the basis of a code-writing search policy and code-assessing value function.on track to satisfy the goal. Answering this question requires an effective representation of partial programs.Existing neural program synthesis techniques differ in how they represent programs. Some represent programs by their syntax(Devlin et al., 2017; Allamanis et al., 2018), producing vector representations of program structure using sequence or graph neural networks. Recently, approaches which instead represent partial programs via their semantic state have been shown to be particularly effective. In these execution-guided neural synthesis approaches(Chen et al., 2018; Ellis et al., 2019;  Zohar & Wolf, 2018), partial programs are executed and represented with their return values. (To see why this is helpful, consider two distinct syntactic expressions 2 + 1 and 6/2; a syntax-based model might assign them different representations, whereas a model using a semantic representation will represent both as equivalent to 3.) However, execution is not always possible for a partial program. In our running example, before the HOLE is filled with an integer value, we cannot meaningfully execute the partially-written loop in s. This is a common problem for languages containing higher-order functions and control flow, where execution of partially written code is often ill-defined. 1 Thus, a key question is: How might we represent the semantics of unfinished code?

