Part III / ACS project ideas

Hardware-assisted generation garbage collection with CHERI

Prerequisites: Knowledge of C, graph theory, virtual memory, MIPS assembly

Sun Labs had a project (Project Maxwell) that identified that the objects in the young generation for a generational garbage collector and the objects in the cache are similar. This project added a full object memory to SPARC and allowed the young-generational collector to run entirely in the cache.

CHERI adds a capability-oriented view of virtual memory to the MIPS ISA. With CHERI, it is possible for the hardware to distinguish between pointers (capabilities to memory) and other data. A previous project wrote a purely software garbage collector that worked for C.

It should be possible to combine these two approaches into a single system that would provide generation garbage collection. The basic idea would be a young generation implemented as a semi-space collector in SRAM in a range of the virtual address space. When a capability to this is stored in the cache, it would be marked as a root. When a cache line containing a capability to such a location is flushed to main memory, it would trap to software, which would promote the object to an older generation.

The young generation could be collected entirely asynchronously, with read barriers implemented by trapping loads from the scratchpad, though an initial cut may prefer to make them synchronous.

Hardware page-table walker for BERI

BERI implements the MIPS R4K system-level interface, which mandates a software-managed TLB. When the CPU can not find an entry in the TLB, it raises an interrupt and the OS is responsible for installing the relevant TLB entry.

The advantage of this approach is that it makes it much easier to experiment with different page table designs - very important when the R4K was released in the early 1990s and virtual memory was still a very active research topic. Unfortunately, it also provides several disadvantages:

  • TLB fills must happen synchronously with respect to the main pipeline - the TLB can’t be speculatively filled waiting for a future load.
  • The TLB-handling interrupt routine consumes instruction cache space.
  • Entering the TLB-fill routine causes a pipeline flush.

A successful implementation will provide a hardware walker that can inspect the FreeBSD page table format and automatically fill the TLB if there is a page ready. As an extension, you could make the TLB-fill logic programmable so that it can support multiple page table formats, or hard-code some others (for example, an inverted page table).

Evaluation should show whether a hardware implementation is faster for a simple in-order pipeline (BERI) by running a variety of workloads with and without the hardware support.

Note: This project requires familiarity with Bluespec SystemVerilog and so is probably best suited to a Part III / ACS student taking the Advanced Computer Design course.

OS Support for Garbage Collection

Prerequisites: A good knowledge of C and the ability to work with concurrent data structures using fine-grained locking

Microsoft Windows provides an API for receiving notifications related to which pages have been modified. This is used in the .NET runtime, but similar interfaces are lacking on other systems. This project will involve modifying the FreeBSD virtual memory subsystem to provide an API that allows garbage collectors to query this data and modifying the Boehm collector to use it.

The Boehm collector provides a platform-independent mechanism for retrieving a list of dirty pages, along with multiple implementations (the Windows API, using mmap() to mark pages as read-only and catching the faults, and a few others) so these changes will be relatively small. The OS changes will be more significant. Some important considerations include:

  • The OS uses the dirty bit to identify pages that are candidates for swapping, so its notion of a dirty page is not the same as the garbage collector’s.
  • Programs use multiple threads. As a first approximation, it would be acceptable to only query for dirty bits after calling pthread_suspend_all_np().
  • For correctness in a garbage collector, it is acceptable to provide a superset of the modified pages - scanning a page that is not modified only hurts performance - but missing a write is a serious problem.

The implementation will most likely involve adding a counter to each page that is incremented when the page moves from clean to dirty status and querying a range of pages to identify whether their counters have incremented since a previous call (make sure you handle overflow in the counters sensibly! This can be just by providing a ‘reset all counters’ API to userspace and).

Evaluation should involve running the modified Boehm collector on some benchmarks and determining whether it provides better performance. If it doesn’t, then evaluation should describe what the overheads were that offset the speedup.