# Computer Laboratory

Dominic's Part II and Part III/ACS Project Proposals 2013/14

The following are sketches of project ideas and are not complete (or static) proposals . If you are interested, please do get in touch.

• Stencil access specifications for verifying numerical code [Part III/ACS project]

The Computer Lab is currently running a research project to apply programming language research to support programming in the sciences, via tools and languages (a slightly longer synopsis can be found here). As part of this project, we are investigating augmenting code with specifications to aid verification, program comprehension and construction, and improve bug analysis. Ths project is one such approach focussed on the verification of the stencil programming pattern, which is common in scientific computing.

Stencils computations are array based transformations, where each element of an output array, at position i, is computed from a finite set of neighbouring elements at position i in some input array(s) (e.g., convolution, the Game of Life). Some stencils are complicated, detailed, and dense (see for example, this stencil computation in a Navier-Stokes fluid simulator) where errors can be easily introduced by accidentally permuting indices, offsets, arrays, and even omitting particular indices.

Many stencil computations have a regular shape following from their derivation (usually a discrete approximation of a system of partial differential equations, see e.g., this tutorial on deriving an approximation for 1D heat equations).

The goal of this project is to design and implement a language of abstract stencil specifications, which can be attached to an existing general-purpose language, e.g. Fortran. These specifications will provide a guide to the programmer and a verification technique for the compiler. The idea is to specify abstract descriptions of the shape of the data access pattern for a stencil. For example, for the 1D Laplace transform:

! spec: symmetrical u
do i = 1, n
v(i) = u(i-1) - 2*u(i) + u(i+1)

This specifies that access to u should be symmetrical (which it is: access one element to the left, access current element, and access one element to the right). This might be extended with "depth", e.g. spec: symmetrical[1] u. (This syntax is speculative).

As another example:

! spec: forward[2] u, reflexive w
do i = 1, n
v(i) = u(i) - 2 * u(i+1) + u(i+2) + w(i)

And another:
! spec: forward{i}[2] u, reflexive{j} u
do i = 1, n
do j = 1, m
v(i,j) = u(i-1, j) - 2 * u(i,j) + u(i+1, j)


These specifications may be given per block or per expression, providing different levels of granularity. At compile time, the program will be checked against the specification. A number of predefined specifications may be provided, with a syntax for extending the language with new specifications.

We have developed a front-end compiler infrastructure for Fortran called CamFort, written in Haskell. CamFort parses a large subset of Fortran code and provides a functional DSL embedded in Haskell for defining analyses and refactoring transformations. This project may reuse the CamFort infrastructure as a basis for adding this specification layer on top of Fortran as a pre-processor/verifyer.

This project does not require familiarty with Fortran (it will be simple to learn the relevant parts of the language). This project does require a strong interest in languages, compilers, program analysis, and some knowledge of programming in Haskell.

We have a number of scientific computing collaborators within the University who might be able to assist in user studies if this is considered. There are many other possible extensions including: code generation from stencils, specification inference, dependency specification (e.g., no output dependencies), additional tools to assist the programmer in understanding their stencils and any bugs, more complicated analysis to deal with manually unrolled/tiled code.

Some related work

• Sketching Stencils (Solar-Lezama et al., 2007) - A specification of stencil is given with an unoptimised "reference" stencil, couple with a partial implementation of a stencil, which is then completed by code generation. Specifications are just simple implementations. This tool is useful for hand writing optimised stencils. Has some useful examples, but specification system is very different to that imagined above.
• The pochoir stencil compiler (Tang et al, 2011) - A specification language for writing stencils in C++ which is then compiled into an optimised form. Has a simple spec language for describing the access pattern of stencils using vectors (see the Shape blocks). Compared to the specification approach described here, this is a more fine grained, less abstract approach which may be just as sensitive to errors and potential tedious for large stencils. Question: can shapes even be associated to particular arrays in this work?
• Auto-Generation and Auto-Tuning of 3D Stencil Codes on GPU Clusters (Zhang, Mueller 2012) - Not immediately related to specifications, but some nice introduction and examples in Section 3.

• TAKEN OCamlLabs: Types for cross-compilation

Increasing numbers of applications are targetting heterogeneous execution environments such as server-side, client-side, GPU, cloud, embedded and web. This is supported by a number of OCaml compiler backends (JavaScript, Java, ARM, LLVM) and libraries allowing interoperability. In such a scenario, certain parts of code may be compiled only for certain platform-specific environments, but some can be safely shared. This project should extend the OCaml type system in order to track execution environments for which an expression can be compiled, and simplify writing code that cross-compiles for multiple platforms and diverse execution environments.

This project idea comes from the OCamlLabs web site (which includes some links to related material) and is also included in Tomas Petricek's project suggestions. Either of us would be happy to supervise/be involved. As the name suggests, this should be done as an extension to the OCaml compiler, but if you're interested in other tools and languages, do talk to either of us.

• TAKEN Types for units-of-measure in Fortran

Types for units of measure is an extremely useful type system extension for program verification, particular in science. The basic idea is to augment numerical types with the "units" that they represent, e.g. metres, grams, seconds, etc. This prevents erroneous computations where, for example, adding a value in metres to a value in seconds is a type error, but dividing a value in metres by a value in seconds is well-typed yielding a value in metres per second.

As a more concrete example, F# has a units-of-measure extension: [Kennedy, 2009]

> let gravityOnEarth = 9.808<m/s/s>;;

val gravityOnEarth : float<m/s ^ 2> = 9.808

> let heightOfBuilding = 40.0<m>;;

val heightOfBuilding : float<m> = 40.0

> let speedOfImpact = sqrt (2.0*gravityOnEarth*heightOfBuilding);;

val speedOfImpact : float<m/s> = 28.01142624


Andrew Kennedy has worked on this problem and has a number of great papers on the subject. The aim of this project is to add similar features to the Fortran language, which is popular in scientific computing.

The Computer Lab has a research project working on programming in the sciences for which it has developed a front-end compiler infrastructure for Fortran called CamFort, written in Haskell. CamFort parses a large subset of Fortran code and provides a functional DSL embedded in Haskell for defining analyses and refactoring transformations over Haskell.

This project will use the CamFort infrastructure as a basis for add a new type-checking layer on top of Fortran as a pre-processor. This will perform type checking/inference and then erase the types to produce a normal Fortran program for compilation (assuming the program is well-typed).

This project does not require familiarty with Fortran (it will be simple to learn the relevant parts of the language). This project does require an enthusiasm for types and language design, as you will need to design the extension yourself, a strong interest in languages, compilers, and type theory, and some knowledge of programming in Haskell.

Evaluation

• Coverage - The extension should suport monomorphic units-of-measure types with standard arithmetic operations (multiply, divide, addition, subtraction, absolute) and dimensionaless types.
• Correct type system - The type system should be correct with respect to abstract specification of the type theory. This will, at the very least, require careful unit testing of the type checking algorithm.
• Low overhead - The additional type system should not provide a significant slow down in the usual compilation time, i.e., the complexity of type checking should be low.
• Correct translation - Adding the units-of-measure types should not alter the semantics of a program. That is, if the program is well-typed, then it should have exactly the same behaviour as if the units-of-measure types were erased. This will need to be tested most likely by incremental unit tests.

• TAKEN Unified notation for mathematically-structured Haskell

There have been many surprising and pleasing discoveries of concepts in abstract mathematics that capture the essence of a class of computations that are common in programming. These mathematical concepts essential provide specifications of useful design patterns in programming, such as functors, monads, comonads, and more. These concepts are used heavily in functional programming and are even pervading non-functional languages.

Haskell, as implemented by the GHC compiler, has a number of syntactic extensions to simplify programming with these mathematical structures: the do notation for monads, the codo notation for comonads, and the arrow notation for more general structures. Since monads can be used to abstract effectful computation, one view of the do is that it provides an embedded domain-specific language of effects. This contrasts with language such as ML where effectful computations can be programmed using the usual syntax.

The aim of this project is to produce a new Haskell front-end that eliminates the do notation in favour of normal let binding, but where the type system resolves whether the computation is pure or impure (monadic) and produces the correct implementation as necessary. The translation on the surface looks quite simple, e.g.

e = let y = f x              do y <- f x
z = g y       --->      z <- g y
in y + z                   return (y + z)


The idea is that ML-style programming can be done in Haskell, but still with all the benefits of the monad abstraction.

Extension The approach can then be extended to include the codo notation for comonads (since the codo notation is not yet standard in GHC the original prototype of codo can be reused).

A further extension is to extend to the arrow notation.

Challenges There are some difficulties in deciding exactly when to perform the translation and when to preserve the let-notation. In the above, if f, g :: Int -> m Int then the left-hand side will fail to type check if it is interpreted as normal let-binding since z :: m Int but g expects a parameter of type Int. In this case it is easy to choose the translation to do. However, if f, g :: a -> m Int then the code will type-check if there is an instance of Num (m Int). In this case the programmer needs to indicate to the translation that the monadic translation is expected. In which some additional syntax may be required, for example, a special type signature or type construction using existing features e.g. e :: Monadic (m Int).

Format Much of the existing Haskell infrastructure can be reused for parsing the simplified syntax (the GHC/Haskell parser is available through haskell-src-exts). The main work will consist of transformations over the Haskell AST to perform the rewriting.

Pre-requisites A fair knowledge of Haskell will be needed and some familiarity with monads and their use.