L25 background reading

This page provides some papers for background reading on L25, organised along the lines of the structure of the course.

Many of the papers are hosted by the ACM Digital Library. There is a normally charge for accessing this service, but you can access it for free from the university network. You are not expected to buy any of the papers. If you wish to read the papers elsewhere, then either download them while on the university network, or use Computer Lab VPN to access the ACM Digital Library from any other connection.

The course assessment does not require that you read these papers, but you may find that they are good starting points for exploring areas introduced in the course that you find interesting. You may also find inspiration for a miniproject idea in one of the papers.

This is by no means an exhaustive list of relevant papers. Following citations is often a good way of finding other interesting things to read. This works both backwards (reading papers from the bibliography) and forwards: The ‘Cited By’ tab in a paper’s ACM Digital Library entry often provides links to interesting follow-on work.

Understanding compiler intermediate representations

Fred Chow provides a good overview of the requirements of a good IR [1]. This article also covers some of the history of IR design.

Static Single Assignment (SSA) form was developed by the authors of [2], which provides a good overview of the algorithms used to create it.

The original LLVM paper [3] provides a good overview of the original motivations and design of LLVM. You will note how much the design has changed over time. Contrast this with the [4] Java bytecode specification, which was created for very different goals.

Dynamic dispatch and duck typing

The canonical paper for polymorphic inline caching in Self [5] is still very relevant.

The lectures skip over the gory details of C++ implementation. If you really want to know how it works, then the ABI docs explain it all [6].

Trace-based JITs originate in dynamic recompilation (not dynamic languages at all!), with Dynamo [7] introducing the technique. Their use in dynamic language implementations is much more recent [8], with the first implementation appearing in Adobe Flash (for ActionScript) and then being contributed to Mozilla (Tamarin).


The SLP vectorizer in LLVM is based on work from 2000 [9]. A more flexible version of the algorithm [10] (with associated costs in complexity, translating directly to run time) was recently published by researchers in the lab.

There are a lot of papers on polyhedral optimisation in general, but for the specific interaction with LLVM, Tobias Grossner’s diploma thesis [11] provides a lot of detail.

Garbage collection

The best overview of garbage collection techniques and the relationship between various approaches is David F. Bacon’s Unified Theory [12]. Most collectors use a combination of techniques discussed in this paper.

For something a bit more modern, the C4 collector [13] is close to the state of the art for high-end systems. Some more work from IBM shows the state of the art at the embedded end [14]


  1. Fred Chow. Intermediate Representation. Commun. ACM 56, 12 (2013), 57–62. [BibTeX] [doi]
  2. Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman and F. Kenneth Zadeck. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Trans. Program. Lang. Syst. 13, 4 (1991), 451–490. [BibTeX] [doi]
  3. Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, IEEE Computer Society (2004), 75–. [BibTeX]
  4. The Java Virtual Machine Specification. . [BibTeX]
  5. Urs Hölzle, Craig Chambers and David Ungar. Optimizing Dynamically-Typed Object-Oriented Languages with Polymorphic Inline Caches. 1991. [BibTeX] [abstract] [pdf]
  6. Itanium C++ ABI. . [BibTeX]
  7. Vasanth Bala, Evelyn Duesterwald and Sanjeev Banerjia. Dynamo: A Transparent Dynamic Optimization System. Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, ACM (2000), 1–12. [BibTeX] [doi]
  8. Andreas Gal, Brendan Eich, Mike Shaver, David Anderson, David Mandelin, Mohammad R. Haghighat, Blake Kaplan, Graydon Hoare, Boris Zbarsky, Jason Orendorff, Jesse Ruderman, Edwin W. Smith, Rick Reitmaier, Michael Bebenita, Mason Chang and Michael Franz. Trace-based Just-in-time Type Specialization for Dynamic Languages. Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM (2009), 465–478. [BibTeX] [doi]
  9. Samuel Larsen and Saman Amarasinghe. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, ACM (2000), 145–156. [BibTeX] [doi]
  10. V. Porpodas, A. Magni and T.M. Jones. PSLP: Padded SLP automatic vectorization. Code Generation and Optimization (CGO), 2015 IEEE/ACM International Symposium on, (2015), 190–201. [BibTeX] [abstract] [pdf] [doi]
  11. Tobias Grosser. Enabling Polyhedral Optimizations in LLVM. 2011. [BibTeX] [pdf]
  12. David F. Bacon, Perry Cheng and V. T. Rajan. A Unified Theory of Garbage Collection. Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, ACM (2004), 50–68. [BibTeX] [doi]
  13. Gil Tene, Balaji Iyengar and Michael Wolf. C4: The Continuously Concurrent Compacting Collector. Proceedings of the International Symposium on Memory Management, ACM (2011), 79–88. [BibTeX] [doi]
  14. David F. Bacon, Perry Cheng and V. T. Rajan. POPL 2003: A Real-time Garbage Collector with Low Overhead and Consistent Utilization. SIGPLAN Not. 48, 4S (2013), 58–71. [BibTeX] [doi]