Computer Laboratory

Technical reports

Discovering and exploiting parallelism in DOACROSS loops

Niall Murphy

March 2016, 129 pages

This technical report is based on a dissertation submitted September 2015 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Darwin College.

Abstract

Although multicore processors have been the norm for a decade, programmers still struggle to write parallel general-purpose applications, resulting in underutilised on-chip resources. Automatic parallelisation is a promising approach to improving the performance of such applications without burdening the programmer. I explore various techniques for automatically extracting parallelism which span the spectrum from conservative parallelisation, where transformations must be proven correct by the compiler, to optimistic parallelisation, where speculative execution allows the compiler to generate potentially unsafe code, with runtime supports to ensure correct execution.

Firstly I present a limit study of conservative parallelisation. I use a novel runtime profiling technique to find all data dependences which occur during execution and build an oracle data dependence graph. This oracle is used as input to the HELIX compiler which automatically generates parallelised code. In this way, the compiler is no longer limited by the inadequacies of compile-time dependence analysis and the performance of the generated code represents the upper limit of what can be achieved by HELIX-style parallelisation. I show that, despite shortcomings in the compile-time analysis, the oracle-parallelised code is no better than that ordinarily produced by HELIX.

Secondly I present a limit study of optimistic parallelisation. I implement a dataflow timing model that allows each instruction to execute as early as possible in an idealistic, zero-overhead machine, thus giving a theoretical limit to the parallelism which can be exploited. The study shows that considerable extra parallelism is available which, due to its dynamic nature, cannot be exploited by HELIX, even with the oracle dependence analysis.

Finally I demonstrate the design of a practical parallelisation system which combines the best aspects of both the conservative and optimistic parallelisation styles. I use runtime profiling to detect which HELIX-identified dependences cause frequent conflicts and synchronise these while allowing other code to run speculatively. This judicious speculation model achieves superior performance to HELIX and approaches the theoretical limit in many cases.

Full text

PDF (2.4 MB)

BibTeX record

@TechReport{UCAM-CL-TR-882,
  author =	 {Murphy, Niall},
  title = 	 {{Discovering and exploiting parallelism in DOACROSS loops}},
  year = 	 2016,
  month = 	 mar,
  url = 	 {http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-882.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-882}
}