Firmament is a new distributed data-flow execution engine for heterogeneous, multi-scale environments currently under development in the Systems Research group.
Four sentence summary
Current task-parallel data flow execution engines (such as our own Ciel) are designed to run in
homogeneous data centre environments.
Even clusters in data centres are not as homogeneous as generally assumed, though, and general-purpose computation takes place in even more heterogeneous environments, with parallelism at many scales, ranging from a handful of cores in a mobile device to highly powerful desktop machines.
Firmament is built around the idea of making heterogeneity explicit, actively monitoring task performance, and dynamically optimizing jobs for the environment they run in.
Having explicit information about the environment helps us to make optimal choices for communication, co-scheduling and workload partitioning, and yields superior performance on many common workloads, while also supporting new workloads that could not previously run within a task-parallel data flow execution engine, such as non-deterministic tasks.
Below screenshots shows the Firmament web interface displaying the state of a Firmament coordinator (a distributed master node), along with the resource utilization on this machine, and various topology views.
|Resource monitor||Simple machine topology view||Cluster topology view (4 machines)|
The main code base for the Firmament engine is on Github.
As scientists, we love data, and so we track our progress on building Firmament here.