Department of Computer Science and Technology

Technical reports

Optimisation of a modern numerical library: a bottom-up approach

Jianxin Zhao

April 2021, 96 pages

This technical report is based on a dissertation submitted September 2019 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Corpus Christi College.

DOI: 10.48456/tr-956

Abstract

Numerical libraries lie in the heart of modern applications, from machine learning, scientific computation and to Internet of Things (IoT). It has dominated many aspects of our daily lives. Numerical library used to lie in the low level of applications, and only need to focus on provide fast calculation. However, with social awareness of privacy and personal data arising, computation is gradually moved to devices in heterogeneous environment. Recently development of edge devices such as Edge TPU also promotes a trend of decentralised computation. Given this trend, a new understanding of the full stack of computation is required to optimise computation at various levels.

In this thesis, based on my experience participating in the development of a numerical library, I present a bottom-up approach that centres on numerical library to describe the optimisation of computation at various levels. I present the low-level design of numerical operations and show the related impact on performance optimisation. I create new algorithms for key operations, and build an automatic tuning module to further improve performance. At the graph level, which consists of multiple operations, I present the idea of using graph as common Intermediate Representation to enable interoperability on other computation frameworks and devices. To demonstrate this approach, I build the TFgraph system that provides a symbolic representation layer to exchange the computation between Owl and other frameworks. At a higher level, the computation graph can be seen as a unit, and at this level I identify the problems of computation composition and deployment. I build the Zoo system to address these two problems. It provides a small Domain-specific Language to enable composition of advanced data analytics services. Benefiting from OCaml’s powerful type system, the Zoo provides type checking for the composition. Besides, the Zoo DSL supports fine-grained version control in composing. It also involves deploying composed services to multiple backends. Finally, the top level involves collaboration of multiple deployed computations. At this level, I focus on the barrier control methods, propose two quantitative metrics to evaluate existing barrier control methods, and bring new insights into their design. I have also built a simulation platform and real-world experiments to perform thorough evaluation of the PSP compared to existing barrier methods.

Full text

PDF (4.0 MB)

BibTeX record

@TechReport{UCAM-CL-TR-956,
  author =	 {Zhao, Jianxin},
  title = 	 {{Optimisation of a modern numerical library: a bottom-up
         	   approach}},
  year = 	 2021,
  month = 	 apr,
  url = 	 {https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-956.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  doi = 	 {10.48456/tr-956},
  number = 	 {UCAM-CL-TR-956}
}