Firmament is a new cluster scheduler for warehouse-scale computers. It makes high-quality placement decisions at low scheduling delay.
To make good placement decisions, Firmament models the scheduling problem as a minimum-cost optimisation over a flow network (as in Quincy). The optimisation considers all possible assignments at the same time, and thus yields optimal results for a given cost model.
Previous work considers the goals of optimality and fast decision time to be mutually incompatible; with Firmament, I show that they are not. Firmament makes scheduling decisions within seconds even over tens of thousands of machines, while offering the same expressivity in terms of scheduling policies as other previous schedulers.
What's it good for?
Most current data centre schedulers assume a homogeneous data centre environments: they assume that all placement choices are equally good.
However, data centres are not as homogeneous as one might think: different machine types are mixed within the same cluster, and co-located tasks compete for resources, which leads to negative interference.
Firmament is built around the idea of making heterogeneity explicit, actively monitors task performance and uses the information collected to make better decisions in the future.
Having explicit information about the environment helps us to make optimal choices for communication, co-scheduling and workload partitioning, and yields superior performance on many common workloads.
Below screenshots shows the Firmament web interface displaying the state of a Firmament a scheduler node, along with the resource utilization on a machine, and various topology views.
|Resource monitor||Simple machine topology view||Cluster topology view (4 machines)|
The code for Firmament is on Github.