Computer Laboratory

Course pages 2013–14

Modern Compiler Design

Short Exercises (20%)

Complete any 3 of the 5 exercises on the exercise sheet. These involve modifying either a simple language for generating cellular automata or a toy dynamic language runtime

The MCS lab machines are set up with a copy of the LLVM 3.3 and a script that will set up your environment variables correctly to use it:

$ . /ux/clteach/L25/ 
Make sure you source this script, don't try to run it

There is also a copy of the CellularAutomata compiler in /ux/clteach/L25, which you can copy into your home directory and modify / compile. The .ca files show some simple programs written in this language.

The DSL used by the CellularAutomata example is described in this presentation, from the slide 'A Simple DSL' onwards

Miniproject and Report (80%)

Demonstrate the effectiveness (or lack thereof) of some optimisation technique of your choice. Project ideas must be approved in advance.

Don't forget that you are primarily marked based on the quality of your evaluation (did it give a speedup? On what kinds of workloads? If not, why not?), not on raw performance improvements.

Some possible examples projects:

Inverted dispatch tables for Objective-C

Modify the GNUstep Objective-C runtime to use inverted dispatch tables. Extend clang to emit specialised call sites, one per selector, directly referencing the inverted dispatch tables.

The runtime must be modified to create and update the tables. The call sites should fall back to using the existing message send mechanism if there is no inverted dispatch table.

Polyhedral DCE for Polly

Implement polyhedral dead code elimination in polly. There are some test cases available and an existing implementation in ppcg can provide inspiration.

Speculating loads vs concurrency

LLVM will speculatively hoist loads outside of conditionals if it can demonstrate that there is no observable side effect within the abstract machine. For example, consider the following:

int g;
int foo(int cond) {
  if (cond)
    return g;
  return 0;

LLVM will hoist the load of g outside of the loop. This is a good optimisation in single-threaded code, but in multithreaded code can cause false sharing. Can you come up with an heuristic for disabling this transform that doesn't cause a slowdown in single-threaded code but avoids the false sharing problem?

See the LLVMDev thread 'Curiosity about transform changes under Sanitizers (Was: [PATCH] Disable branch folding with MemorySanitizer)' for more details and a test cast.

Analyse LLVM's lowering of atomics

Peter Sewell et al. have discovered that LLVM has a very conservative approach to lowering atomics on ARM, such that an algorithm that relaxes the memory orderings can end up being compiled to code with a stronger model. Implement a set of lockless algorithms and provide as set of test cases, with the generated assembly annotated to indicate where the back end has been overly aggressive in inserting barriers. Analyse the slowdown that these overly aggressive barriers cause.

Extend the Cellular Automata language for parallelism

The ordering of writes to the global registers in the CA language from the assignments is undefined. Extend the language by providing some form of explicit synchronisation (for example, barriers, mutexes) and an implementation that uses autovectorisation, autoparallelism, or both, yet provides deterministic output.

Structure reorganisation

For large structures, you can achieve better performance if fields in a structure that are accessed together are close enough that they are in the same cache line. If fields are never accessed together (but there is concurrency), it may be better to add padding such that some are never in the same line. Can you write a pass that will rearrange the layouts of structure types to enforce this? Make sure that you don't modify any types that are passed outside the current module...