Computer Laboratory

Research

Firmament

Firmament is a new distributed data-flow execution engine for heterogeneous, multi-scale environments currently under development in the Systems Research group.

Four sentence summary

Current task-parallel data flow execution engines (such as our own Ciel) are designed to run in homogeneous data centre environments.

Even clusters in data centres are not as homogeneous as generally assumed, though, and general-purpose computation takes place in even more heterogeneous environments, with parallelism at many scales, ranging from a handful of cores in a mobile device to highly powerful desktop machines.

Firmament is built around the idea of making heterogeneity explicit, actively monitoring task performance, and dynamically optimizing jobs for the environment they run in.

Having explicit information about the environment helps us to make optimal choices for communication, co-scheduling and workload partitioning, and yields superior performance on many common workloads, while also supporting new workloads that could not previously run within a task-parallel data flow execution engine, such as non-deterministic tasks.

Sneak peek

Below screenshots shows the Firmament web interface displaying the state of a Firmament coordinator (a distributed master node), along with the resource utilization on this machine, and various topology views.

Resource monitor Simple machine topology view Cluster topology view (4 machines)

Code

The main code base for the Firmament engine is on Github.

Statistics

As scientists, we love data, and so we track our progress on building Firmament here.

Lines of code

Test cases