GEOMETRY OF PROGRAM SYNTHESIS

Abstract

We present a new perspective on program synthesis in which programs may be identified with singularities of analytic functions. As an example, Turing machines are synthesised from input-output examples by propagating uncertainty through a smooth relaxation of a universal Turing machine. The posterior distribution over weights is approximated using Markov chain Monte Carlo and bounds on the generalisation error of these models is estimated using the real log canonical threshold, a geometric invariant from singular learning theory.

1. INTRODUCTION

The idea of program synthesis dates back to the birth of modern computation itself (Turing, 1948) and is recognised as one of the most important open problems in computer science (Gulwani et al., 2017) . However, there appear to be serious obstacles to synthesising programs by gradient descent at scale (Neelakantan et al., 2016; Kaiser & Sutskever, 2016; Bunel et al., 2016; Gaunt et al., 2016; Evans & Grefenstette, 2018; Chen et al., 2018) and these problems suggest that it would be appropriate to make a fundamental study of the geometry of loss surfaces in program synthesis, since this geometry determines the learning process. To that end, in this paper we explain a new point of view on program synthesis using the singular learning theory of Watanabe ( 2009) and the smooth relaxation of Turing machines from Clift & Murfet (2018) . In broad strokes this new geometric point of view on program synthesis says: • Programs to be synthesised are singularities of analytic functions. If U ⊆ R d is open and K : U -→ R is analytic, then x ∈ U is a critical point of K if ∇K(x) = 0 and a singularity of the function K if it is a critical point where K(x) = 0. • The Kolmogorov complexity of a program is related to a geometric invariant of the associated singularity called the Real Log Canonical Threshold (RLCT). This invariant controls both the generalisation error and the learning process, and is therefore an appropriate measure of "complexity" in continuous program synthesis. See Section 3. • The geometry has concrete practical implications. For example, a MCMC-based approach to program synthesis will find, with high probability, a solution that is of low complexity (if it finds a solution at all). We sketch a novel point of view on the problem of "bad local minima" (Gaunt et al., 2016) based on these ideas. See Section 4. We demonstrate all of these principles in experiments with toy examples of synthesis problems. Program synthesis as inference. We use Turing machines, but mutatis mutandis everything applies to other programming languages. Let T be a Turing machine with tape alphabet Σ and set of states Q and assume that on any input x ∈ Σ * the machine eventually halts with output T (x) ∈ Σ * . Then to the machine T we may associate the set {(x, T (x))} x∈Σ * ⊆ Σ * × Σ * . Program synthesis is the study of the inverse problem: given a subset of Σ * × Σ * we would like to determine (if possible) a Turing machine which computes the given outputs on the given inputs. If we presume given a probability distribution q(x) on Σ * then we can formulate this as a problem of statistical inference: given a probability distribution q(x, y) on Σ * × Σ * determine the most likely machine producing the observed distribution q(x, y) = q(y|x)q(x). If we fix a universal Turing machine U then Turing machines can be parametrised by codes w ∈ W code with U(x, w) = T (x) for all x ∈ Σ * . We let p(y|x, w) denote the probability of U(x, w) = y (which is either zero or one) 1

