NEURALLY GUIDED GENETIC PROGRAMMING FOR TURING COMPLETE PROGRAMMING BY EXAMPLE

Abstract

The ability to synthesise source code from input/output examples allows nonexperts to generate programs, and experts to abstract away a wide range of simple programming tasks. Current research in this area has explored neural synthesis, SMT solvers, and genetic programming; each of these approaches is limited, however, often using highly specialised target languages for synthesis. In this paper we present a novel hybrid approach using neural networks to guide genetic programming (GP), which allows us to successfully synthesise code from just ten I/O examples in a generalised Turing complete target language, up to and including a sorting algorithm. We show that GP by itself is able to synthesise a set of simple programs, and show which hints (suggested lines of code for inclusion) are of most utility to GP in solving harder problems. Using a form of unstructured curriculum learning, we then demonstrate that neural networks can be used to determine when to make use of these high-utility hints for specific I/O problems and thus enable complex functions to be successfully synthesised. We apply our approach to two different problem sets: common array-to-array programs (including sorting), and a canvas drawing problem set inspired by So & Oh (2018).

1. INTRODUCTION

The ability to synthesise source code from examples, in which a source code implementation of a function is created based on one or more demonstrations of input-output mapping, is a fundamental question in machine learning. We specifically study this question in the form of scenarios where large corpora of existing code are not available (e.g., human-written programs in open-source repositories). The immediate applications of this would allow non-programmers to generate programs, or experts to abstract away trivial coding tasks. In addition, from a machine learning perspective, it allows complex functions to be generated in a symbolic and human-readable form -which can be subjected to a wide range of static analysis tools to model generality or correctness. To date this challenge has been studied using neural-network-driven synthesis, genetic programming, and SMT solvers. However, at present these approaches are significantly constrained in the complexity of the target language in which code is synthesised. Neural synthesis for example, such as the DeepCoder architecture (Balog et al., 2017; Zohar & Wolf, 2018) , shows success on simple domain-specific languages, but the search space of more realistic Turing-complete languages is vast by comparison and is unlikely to be representable in a neural network on current or near-future hardware. Genetic programming, meanwhile, is limited by our ability to specify a fitness function which can successfully navigate to a solution for a particular problem -in the highly irregular and often flat fitness landscape of program space (Kinnear, 1994; Renzullo et al., 2018) . SMT solvers by comparison lack the analytical power to handle loops without human guidance, constraining their applicability (Srivastava et al., 2010a; So & Oh, 2018; Srivastava et al., 2010b) . In this paper we examine code synthesis from examples for a Turing-complete language which can be cross-compiled into C/Java. We use just 10 input/output examples to describe each problem for which we need to synthesise a matching function (i.e., providing unsorted and sorted arrays of integers to describe sorting); because our target language yields a total search space size of 5 * 10 119 possible program permutations, scalability of the synthesis technique is crucial. In this context, we use a novel combination of genetic programming (GP) and neural networks (NNs); GP is used to navigate within program space from a given starting point using a general-purpose fitness function, while NN methods are used to provide prediction of high-level features which help guide the GP to higher-probability success areas in which to search. In essence this technique allows the NN to model only highly abstract features of program space, allowing it to scale to vast program search spaces, while the GP can then incrementally traverse program space from an NN-derived starting point to a correct solution. We bootstrap this process by using an unstructured form of curriculum learning, in which successfully-found functions are used as seed programs to generate synthetic corpora on which to train new neural networks, leading to further high-utility source code feature predictions for new problems. Initially this curriculum learning is based on programs that can be successfully found using GP alone with our generic fitness function, which then allows us to synthesise more complex programs using NN inference. Our key results demonstrate that GP is able to solve simple synthesis problems unaided, and that synthetic corpora generated from these problems allow popular neural network architectures to identify high-utility search hints on more complex problems to guide the genetic programming search. In one of our problem sets, this has the effect of allowing the framework to successfully synthesise 7 of the 10 programs that had never been found by GP alone, and among other programs moves success rates from 38% to 55%. All of our source code is provided online (pending de-anonymisation).

2. RELATED WORK

Code synthesis from I/O examples has been studied using three major approaches: deductive solvers; neural networks (NNs) with search; and genetic programming (GP). For code synthesis in a Turingcomplete language, deductive solvers have yet been shown to operate well with loop-based flow control operators (although frameworks which manually define any non-linear program flow can yield good performance (So & Oh, 2018) ). Neural synthesis, by comparison, is limited by how much of program space for a general-purpose target language can be captured in a NN model, while GP is limited by the difficulty of deriving a fitness function to navigate to a solution (Renzullo et al., 2018) . In the remainder of this section we focus on NN and GP approaches in more detail. Neural synthesis Neural synthesis works by training a neural network on a sub-sample of the entirety of program space for the target language (e.g., sampling at a uniform interval or at random). When presented with a new problem as an I/O example, the neural network will then be asked to predict which lines of code (or particular operators) are likely to be present in the solution based on similar I/O transforms observed in the training set from the above sub-sample. The system will then perform an exhaustive search of program space to fill in the remaining (non-predicted) features. Notable examples here include DeepCoder and RobustFill, among others (e.g., Balog et al. ( 2017 The key limitation to this approach is that it must be able to train on a detailed enough sub-sample of program space, and store this sample inside a neural network, to make meaningful predictions on program features for unseen problems. While this works for highly simplified languages (Deep-Coder, for example, has no loop operators (Balog et al., 2017) ), the search space size of a Turingcomplete language is astronomical by comparison. If we consider that the capacity of a feed-forward ReLu neural network to differentiate between classes (in our case programs), termed its VapnikChervonenkis (VC) dimension, at best grows as a linear function of w * L * log(w) where w is the total number of weights and L the number of layers (Bartlett et al., 2019) , it is unlikely that a neural network on current hardware would be able to represent a useful sub-sample of possible program permutations yielded by the search space of a general-purpose language. Genetic programming GP relies on iterative travel through program space from a starting point (often an empty program) to the solution, guided by a fitness function (Vanneschi & Poli, 2012; Taleby Ahvanooey et al., 2019) . The field has a long history (Forsyth, 1981) but still shows results (Miranda et al., 2019) that are competitive with neural networks (Ain et al., 2020) , and an ability to tackle complex problems mixing diverse datatypes (Pantridge & Spector, 2020). Unlike neural synthesis, a GP approach does not need to encode the entirety of program space in a model, and so can in theory work in a scalable fashion on high dimensional search spaces as long as a fitness function is provided which can guide the search incrementally towards a solution. The key problem with GP for code synthesis is that large areas of program space are difficult to navigate,



); Zohar & Wolf (2018); Devlin et al. (2017); Chen et al. (2019); Singh & Kohli (2017))

