Computer Laboratory

OCaml Labs

The OCaml Compiler

We are working on maintaining and extending the core OCaml compiler toolchain and runtime. This consists of a variety of day-to-day tasks such as submitting bug fixes, and also longer-term improvements to the type system and runtime libraries (see below).

We are actively engaging with the wider OCaml community to ensure that any improvements and modifications we propose are thoroughly discussed, well-formulated and maintainable.

Team

Tasks

Sep '12Oct '12Nov '12Dec '12Jan '13Feb '13Mar '13Apr '13May '13Jun '13Jul '13Aug '13Sep '13Oct '13Nov '13Dec '13Jan '14Feb '14Mar '14Apr '14
Leo White Open types
Leo White Record disambiguation
Leo White Syntax extensions
Leo White Namespaces plan
Stephen Dolan Multicore
plan?
Jeremy Yallop Compile-time meta programming
plan?
Mark Shinwell Emission of DWARF debugging information
Xavier Clerc OCaml Java 2.0 preview
 

Open types

Complete by Leo White (Oct 2012 - Jan 2013)
Github   Website  

Add open extensible types to OCaml. One open type already exists within OCaml: the exn type used for exceptions. This project extends this mechanism to allow the programmer to create their own open types. This has previously been proposed for functional languages a number of times, for instance as part of a solution to the expression problem (Loh et al. "Open Data Types and Open Functions"). Unlike "exn", these extensible types can have type parameters, allowing for extensible GADTs.

For example:

type foo = ..

type foo += A

type foo += B of int

let is_a x =
  match x with
    A -> true
  | _ -> false

Record disambiguation

Complete by Leo White (Sep 2012 - Dec 2012)
Mantis   Bug report  

Type-based record disambiguation: Leo helped with the record-disambiguation branch of OCaml by Jacques Garrigue. This branch uses type-information to disambiguate between record labels and variant constructors with the same names. For discussions of the semantics of this feature see Gabriel's or Alain's blog posts. Leo rewrote the record-disambiguation branch to use an alternative semantics and improved the error messages. The branch has since been merged into OCaml trunk


Syntax extensions

Complete by Leo White (Dec 2012 - Jun 2013)
Working group   Blog  

Since its creation camlp4 has proven to be a very useful tool. People have used it to experiment with new features for OCaml and to provide interesting meta-programming facilities. However, there is general agreement that camlp4 is too powerful and complex for the applications that it is most commonly used for, and there is a growing movement to provide a simpler alternative.

A working group was formed (wg-camlp4@lists.ocaml.org) regarding the future of syntax extensions in OCaml. The aim of the working group is to formulate a solid transition plan to create a 'basic OCaml ecosystem' that does not require camlp4. Alain Frisch's introductory email has more detail and can be found in the archive.


Namespaces plan

Complete by Leo White (Feb 2013 - Apr 2013)
Blog   Epic Mail Thread  

Namespaces provide a means for grouping the components of a library together.

Up to now this has been achieved using the OCaml module system. Since the components of an OCaml library are modules, a module can be created that contains all the components of the library as sub-modules. However, there are some critical problems with creating a single module containing the whole library:

  • The module is a single unit that has to be linked or not as a whole. This means that any program using part of the library must include the entire library.

  • The module is a choke-point in the dependency graph. If a file depends on one thing in the library then it needs to be recompiled if anything in the library changes.

  • Opening a very large module is slow and can seriously affect build performance.

These problems are caused by the runtime semantics of modules. Namespaces have no runtime sematics and could provide a solution to these problems.


Multicore

Planning by Stephen Dolan (Mar 2013 -)
Github  

Building an efficient multi-core garbage collector remains a challenging research issue. We will begin by investigating techniques that do not involve fine-grained locking, as this has been shown to be error-prone, difficult to maintain, and impacts sequential performance. Instead, we prefer to isolate threads to individual cores, with region-based heaps that can be garbage collected separately. This technique has been successfully demonstrated in MultiMLton, system and runtime, and we will implement it in OCaml and its large body of third-party code.


Compile-time meta programming

Planning by Jeremy Yallop (Apr 2013 -)

There are currently two ways of generating OCaml code from within OCaml programs: camlp4 (and its successor, ppx), which produces untyped syntax, and MetaOCaml, which produces typed code.

Our research into static metaprogramming aims to combine the advantages of the two approaches. The resulting system will allow users to write code generators that enjoy the guarantees provided by MetaOCaml, but that run entirely during compilation, like camlp4 extensions. Typeful generation of code based on statically-available information has applications in a number of areas, including deriving-style generic programming extensions, HTML templates, foreign function interface generation and embedded DSLs.


Emission of DWARF debugging information

In Progress by Mark Shinwell (Jan 2013 - Jan 2014)
4.00.1-allocation-profiling  

Debuggers such as the GNU debugger gdb are valuable tools when tracking down problems in low-level or parallel applications. The programmer experience when using such a debugger to examine natively-compiled OCaml programs currently lacks lustre. Recent versions of the compiler can emit a limited amount of debugging information which enables the recovery of correct stack traces in the debugger. However names of functions still appear in mangled form, it is not possible to reference local variables by name, and traversal of OCaml values is troublesome. This is unfortunately by no means an exhaustive list of deficiencies.

This project aims to equip the native-code OCaml compiler and the GNU debugger with the necessary infrastructure to improve debugging of OCaml programs. The compiler will be enhanced to emit the standard DWARF debugging information format in order to describe the naming and placement of data together with relevant type information. At the same time the debugger will gain functionality to understand the OCaml-specific parts of this information including the ability to demangle OCaml names. It is planned to implement much of the DWARF output stage in the compiler and the debugger-side support in libraries such that they might be re-used in other projects.

It is hoped that, as support for native-code debugging of OCaml programs in the traditional manner evolves, it will become more easily possible to build more advanced debugging tools. These might exploit the scripting capabilities of gdb, for example, and target environments such as large-scale concurrent systems.

This work is ongoing in the dwarf branch of the OCaml repository.


OCaml Java 2.0 preview

Complete by Xavier Clerc (Apr 2013 - Aug 2013)
Homepage  

OCaml Java is a compiler from OCaml source code to Java bytecode, that can run on any modern Java runtime. This is an interesting way to explore the multicore runtime performance of OCaml with a highly concurrent collector, as is present in the latest JVMs.

The goal of this work is to stabilise and release the preview of 2.0, which greatly improves CPU utilisation and memory footprint.