International Summer School on Metaprogramming

Robinson College, Cambridge, 8^th to 12^th August 2016

Normalisation and embedding

Philip Wadler, Sam Lindley and Shayan Najd (University of Edinburgh)

We discuss a range of approaches to domain specific languages (DSLs), centred on the ideas of embedding and normalisation. We focus on embedded DSLs that generate programs in other languages, for instance, an F# program that generates SQL for database queries (LINQ) or a Haskell program that generates C for signal processing (Feldspar). Normalisation is a popular tool used, often implicitly, in many embedded DSLs. Our approaches are based on the subformula principle introduced by Gentzen in the 1930s, and normalisation by evaluation applied by Martin-Löf in the 1970s and rediscovered by Berger and Schwichtenberg in the 1990s.

This course is in three parts. Part I motivates normalisation and the subformula principle for embedding domain-specific languages through quoted domain specific languages. Part II introduces normalisation by evaluation. Part III synthesises the key ideas from Part I and Part II as embedding by normalisation.

Part I: Everything old is new again: Quoted domain specific languages

(Philip Wadler)

Fashions come, go, return. We describe a new approach to domain specific languages, called QDSL, that resurrects two old ideas: quoted terms for domain specific languages, from McCarthy's Lisp of 1960, and the subformula principle, from Gentzen's natural deduction of 1935. Quoted terms allow the domain specific language to share the syntax and type system of the host language. Normalising quoted terms ensures the subformula principle, which provides strong guarantees, e.g., that one can use higher-order or nested code in the source while guaranteeing first-order or flat code in the target, or using types guide loop fusion. We give two applications of these ideas.

First, we embed SQL using Microsoft's LINQ framework for F#. Our version of LINQ supports abstraction over values and predicates, composition of queries, dynamic generation of queries, and queries with nested intermediate data. Higher-order features prove useful even for constructing first-order queries. We prove that normalisation always succeeds in translating any query of flat relation type to SQL. We present experimental results confirming our technique works, even in situations where Microsoft's LINQ framework either fails to produce an SQL query or, in one case, produces an avalanche of SQL queries.

Second, we re-implement Feldspar in Haskell. Feldspar was originally implemented as an Embedded DSL (EDSL). We compare the QDSL and EDSL variants.

Part II: Normalisation by evaluation

(Sam Lindley)

Given any equational theory we can define a notion of normal form as a canonical representative of an equivalence class. Normalisation by evaluation (NBE) is a technique for computing such normal forms by first evaluating a term in a suitable denotational semantics and then reifying the resulting semantic object as a term. Soundness and completeness of the model guarantees that the term is a normal form.

NBE crops up in many guises. For instance, it has been studied in the context of proof theory, type theory, category theory, partial evaluation, and program transformation. A central feature of NBE is that it provides a semantic foundation for normalisation, a procedure that is more often treated entirely syntactically.

We present an introduction to the theory and practice of NBE. We cover intensional and extensional NBE for lambda calculus, NBE for computations, NBE for sums, NBE as type-directed partial evaluation, and the relationship between NBE and reduction-based normalisation.

Part III: Embedding by normalisation

(Shayan Najd)

We introduce embedding by normalisation (EBN), as a semantic approach to analysing and structuring domain specific languages, based on normalisation by evaluation (NBE). NBE has five components: a syntactic domain, a class of normal forms of the syntactic domain, a semantic domain, an interpretation of syntactic terms as semantic values, and a mapping from semantic values to normal forms. EBN also has five components: an object language, a target sublanguage of the object language, a host language representation of the object language terms, an encoding of object language terms in this representation, and a means for extracting target terms from the host language representation. Each of the five components of NBE and EBN correspond to one another.

A practical benefit of EBN is that it enables us to directly apply techniques from NBE. We adapt the treatment of sums from NBE to EBN, resolving an open problem. Similarly, we adapt the treatment of constants from NBE to EBN.

We introduce a generic framework for EBN. The framework supports DSLs built on lambda calculus with sums, and customisable interpretations of constants. Using the framework, we present an EBN implementation of Feldspar.

Type-safe embedding and optimizing domain-specific languages in the typed final style

Oleg Kiselyov (Tohoku University)

Tagless-final style is a general method of embedding (typed, higher-order) domain-specific languages (DSL) in a typed functional language such as Haskell, OCaml, Scala and Coq. Once written, a DSL expression may be interpreted many times. Evaluator, compiler, partial evaluator, pretty-printer are all interpreters. At any time one may add more interpreters, and even more expression forms to the DSL while re-using the previous interpreters as they were. Only well-typed DSL terms are representable. Therefore, the type checker of the host language checks -- and even infers for us -- DSL types. Even DSLs with resource-sensitive (affine, linear) types are thus embeddable.

It was recently discovered how to transform, simplify, and generally, optimize DSLs embedded in this style. The optimization rules are reusable, modular, composable, and type- and scope-preserving by construction.

Using a familiar example, the tutorial will explain the general framework of embedding and optimizing DSLs. I intend the tutorial to be interactive, working through problems with active audience participation.

Goals: by the end of the tutorial, the participants will:

implement a simple but realistic and usable DSL of logical circuits (hardware description language)
write many interesting transformations, such as constant propagation, conversion to normal forms, and simplification -- in small and easily composable steps.

Self-Specialising Interpreters and Partial Evaluation

Chris Seaton (Oracle Labs)

Abstract syntax trees are a simple way to represent programs and to implement language interpreters. They can also be an easy way to produce high performance dynamic compilers through combining then with self-specialisation and partial evaluation.

Self-specialisation allows the nodes in a program tree to rewrite themselves with more specialised variants in order to increase performance, such as replacing methods calls with inline caches or to replace stronger operations with weaker ones based on profiled types. Partial evaluation can then take this specialised abstract syntax tree and produce optimised machine code based on it.

We’ll show how these two techniques work and how they have been implemented by Oracle Labs in Truffle and Graal and used in implementations of languages including JavaScript, C, Ruby, R and more.

Comparing approaches to generic programming

José Pedro Magalhães (Standard Chartered)

Datatype-generic programming increases program abstraction and reuse by making functions operate uniformly across different types. Many approaches to generic programming have been proposed over the years, most of them for Haskell, but recently also for dependently typed languages such as Agda. Different approaches vary in expressiveness, ease of use, and implementation techniques.

Some work has been done in comparing the different approaches informally. However, to our knowledge there have been no attempts to formally prove relations between different approaches. We thus present a formal comparison of generic programming libraries. We show how to formalise different approaches in Agda, including a coinductive representation, and then establish theorems that relate the approaches to each other. We provide constructive proofs of inclusion of one approach in another that can be used to convert between approaches, helping to reduce code duplication across different libraries. Our formalisation also helps in providing a clear picture of the potential of each approach, especially in relating different generic views and their expressiveness.

The highs and lows of macros in a modern language

Laurence Tratt (Kings College London)

Staging generic programming

Jeremy Yallop (University of Cambridge)

Generic programming libraries such as Scrap Your Boilerplate eliminate the need to write repetitive code, but typically introduce significant performance overheads. This leaves programmers with the unfortunate choice of writing succinct but slow programs or writing tedious but efficient programs. We show how to systematically transform an implementation of the Scrap Your Boilerplate library in the multi-stage programming language MetaOCaml to eliminate the overhead, making it possible to combine the benefits of high-level abstract programming with the efficiency of low-level code.

Foundations of meta-programming

Martin Berger (University of Sussex)