Department of Computer Science and Technology

Technical reports

Application identification in data centres: a traffic driven approach

Mihnea-Stefan Popeanga

August 2025, 72 pages

This technical report is based on a dissertation submitted June 2025 by the author for the degree of Master of Philosophy (Advanced Computer Science) to the University of Cambridge, St Edmunds College.

DOIhttps://doi.org/10.48456/tr-1000

Abstract

Modern data centre (DC) operators cannot tune and secure what they cannot see. However, application identification from network traces is held back by two obstacles: public packet captures are scarce because commercial workloads and user data are confidential; the few datasets that exist do not focus on DC specific workloads, and do not allow others to reproduce the experiments. This dissertation tackles both these issues. I designed and implemented an end-to-end framework that can systematically capture traffic with nanosecond timestamps, demultiplex flows, and compute a set of 203 features. Each flow is coupled with extensive metadata detailing the exact setup that generated the traffic, allowing any researcher to reproduce the experiments under identical conditions. Using this workflow, I created the first public DC-focused dataset, unencumbered with personal or confidential information, that spans three representative workloads. Machine learning classification techniques demonstrate the utility of the data: traditional feature-based models achieve perfect accuracy when identifying the three workloads. A core novelty is that besides strict identification, the collected data includes significant metadata. To demonstrate this, I tackled performance estimation as well for one of the workloads. A 1D CNN can distinguish between flows corresponding to different performance metrics with an accuracy of 95%.

Full text

PDF (3.4 MB)

BibTeX record

@TechReport{UCAM-CL-TR-1000,
  author =	 {Popeanga, Mihnea-Stefan},
  title = 	 {{Application identification in data centres: a traffic
         	   driven approach}},
  year = 	 2025,
  month = 	 aug,
  url = 	 {https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1000.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  doi = 	 {10.48456/tr-1000},
  number = 	 {UCAM-CL-TR-1000}
}