Department of Computer Science and Technology

Technical reports

Latency-driven performance in data centres

Diana Andreea Popescu

June 2019, 190 pages

This technical report is based on a dissertation submitted December 2018 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Churchill College.

Abstract

Data centre based cloud computing has revolutionised the way businesses use computing infrastructure. Instead of building their own data centres, companies rent computing resources and deploy their applications on cloud hardware. Providing customers with well-defined application performance guarantees is of paramount importance to ensure transparency and to build a lasting collaboration between users and cloud operators. A user’s application performance is subject to the constraints of the resources it has been allocated and to the impact of the network conditions in the data centre.

In this dissertation, I argue that application performance in data centres can be improved through cluster scheduling of applications informed by predictions of application performance for given network latency, and measurements of current network latency in data centres between hosts. Firstly, I show how to use the Precision Time Protocol (PTP), through an open-source software implementation PTPd, to measure network latency and packet loss in data centres. I propose PTPmesh, which uses PTPd, as a cloud network monitoring tool for tenants. Furthermore, I conduct a measurement study using PTPmesh in different cloud providers, finding that network latency variability in data centres is still common. Normal latency values in data centres are in the order of tens or hundreds of microseconds, while unexpected events, such as network congestion or packet loss, can lead to latency spikes in the order of milliseconds. Secondly, I show that network latency matters for certain distributed applications even in small amounts of tens or hundreds of microseconds, significantly reducing their performance. I propose a methodology to determine the impact of network latency on distributed applications performance by injecting artificial delay into the network of an experimental setup. Based on the experimental results, I build functions that predict the performance of an application for a given network latency. Given the network latency variability observed in data centres, applications’ performance is determined by their placement within the data centre. Thirdly, I propose latency-driven, application performance-aware, cluster scheduling as a way to provide performance guarantees to applications. I introduce NoMora, a cluster scheduling architecture that leverages the predictions of application performance dependent upon network latency combined with dynamic network latency measurements taken between pairs of hosts in data centres to place applications. Moreover, I show that NoMora improves application performance by choosing better placements than other scheduling policies.

Full text

PDF (21.5 MB)

BibTeX record

@TechReport{UCAM-CL-TR-937,
  author =	 {Popescu, Diana Andreea},
  title = 	 {{Latency-driven performance in data centres}},
  year = 	 2019,
  month = 	 jun,
  url = 	 {https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-937.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-937}
}