NEURALLY GUIDED GENETIC PROGRAMMING FOR TURING COMPLETE PROGRAMMING BY EXAMPLE

Abstract

The ability to synthesise source code from input/output examples allows nonexperts to generate programs, and experts to abstract away a wide range of simple programming tasks. Current research in this area has explored neural synthesis, SMT solvers, and genetic programming; each of these approaches is limited, however, often using highly specialised target languages for synthesis. In this paper we present a novel hybrid approach using neural networks to guide genetic programming (GP), which allows us to successfully synthesise code from just ten I/O examples in a generalised Turing complete target language, up to and including a sorting algorithm. We show that GP by itself is able to synthesise a set of simple programs, and show which hints (suggested lines of code for inclusion) are of most utility to GP in solving harder problems. Using a form of unstructured curriculum learning, we then demonstrate that neural networks can be used to determine when to make use of these high-utility hints for specific I/O problems and thus enable complex functions to be successfully synthesised. We apply our approach to two different problem sets: common array-to-array programs (including sorting), and a canvas drawing problem set inspired by So & Oh (2018).

1. INTRODUCTION

The ability to synthesise source code from examples, in which a source code implementation of a function is created based on one or more demonstrations of input-output mapping, is a fundamental question in machine learning. We specifically study this question in the form of scenarios where large corpora of existing code are not available (e.g., human-written programs in open-source repositories). The immediate applications of this would allow non-programmers to generate programs, or experts to abstract away trivial coding tasks. In addition, from a machine learning perspective, it allows complex functions to be generated in a symbolic and human-readable form -which can be subjected to a wide range of static analysis tools to model generality or correctness. To date this challenge has been studied using neural-network-driven synthesis, genetic programming, and SMT solvers. However, at present these approaches are significantly constrained in the complexity of the target language in which code is synthesised. Neural synthesis for example, such as the DeepCoder architecture (Balog et al., 2017; Zohar & Wolf, 2018) , shows success on simple domain-specific languages, but the search space of more realistic Turing-complete languages is vast by comparison and is unlikely to be representable in a neural network on current or near-future hardware. Genetic programming, meanwhile, is limited by our ability to specify a fitness function which can successfully navigate to a solution for a particular problem -in the highly irregular and often flat fitness landscape of program space (Kinnear, 1994; Renzullo et al., 2018) . SMT solvers by comparison lack the analytical power to handle loops without human guidance, constraining their applicability (Srivastava et al., 2010a; So & Oh, 2018; Srivastava et al., 2010b) . In this paper we examine code synthesis from examples for a Turing-complete language which can be cross-compiled into C/Java. We use just 10 input/output examples to describe each problem for which we need to synthesise a matching function (i.e., providing unsorted and sorted arrays of integers to describe sorting); because our target language yields a total search space size of 5 * 10 119 possible program permutations, scalability of the synthesis technique is crucial. In this context, we use a novel combination of genetic programming (GP) and neural networks (NNs); GP is used to navigate within program space from a given starting point using a general-purpose fitness function,

