Computer Laboratory

Technical reports

An open parallel architecture for data-intensive applications

Mohamad Afshar

July 1999, 225 pages

This technical report is based on a dissertation submitted December 1998 by the author for the degree of Doctor of Philosophy to the University of Cambridge, King’s College.


Data-intensive applications consist of both declarative data-processing parts and imperative computational parts. For applications such as climate modelling, scale hits both the computational aspects which are typically handled in a procedural programming language, and the data-processing aspects which are handled in a database query language. Although parallelism has been successfully exploited in the data-processing parts by parallel evaluation of database queries associated with the application, current database query languages are poor at expressing the computational aspects, which are also subject to scale.

This thesis proposes an open architecture that delivers parallelism shared between the database, system and application, thus enabling the integration of the conventionally separated query and non-query components of a data-intensive application. The architecture is data-model independent and can be used in a variety of different application areas including decision-support applications, which are query based, and complex applications, which comprise procedural language statements with embedded queries. The architecture encompasses a unified model of parallelism and the realisation of this model in the form of a language within which it is possible to describe both the query and non-query components of data-intensive applications. The language enables the construction of parallel applications by the hierarchical composition of platform-independent parallel forms, each of which implements a form of task or data parallelism. These forms may be used to determine both query and non-query actions.

Queries are expressed in a declarative language based on “monoid comprehensions”. The approach of using monoids to model data types and monoid homomorphisms to iterate over collection types enables mathematically provable compile-time optimisations whilst also facilitating multiple collection types and data type extensibility. Monoid comprehension programs are automatically transformed into parallel programs composed of applications of the parallel forms, one of which is the “monoid homomorphism”. This process involves identifying the parts of a query where task and data parallelism are available and mapping that parallelism onto the most suitable form. Data parallelism in queries is mapped onto a form that implements combining tree parallelism for query evaluation and dividing tree parallelism to realise data partitioning. Task parallelism is mapped onto two separate forms that implement pipeline and independent parallelism. This translation process is applied to all comprehension queries including those in complex applications. The result is a skeleton program in which both the query and non-query parts are expressed within a single language. Expressions in this language are amenable to the application of optimising skeleton rewrite rules.

A complete prototype of the decision-support architecture has been constructed on a 128-cell MIMD parallel computer. A demonstration of the utility of the query framework is performed by modelling some of OQL and a substantial subset of SQL. The system is evaluated for query speedup with a number of hardware configurations using a large music catalogue database. The results obtained show that the implementation delivers the performance gains expected while offering a convenient definition of the parallel environment.

Full text

PS (0.8 MB)
DVI (0.3 MB)

BibTeX record

  author =	 {Afshar, Mohamad},
  title = 	 {{An open parallel architecture for data-intensive
  year = 	 1999,
  month = 	 jul,
  url = 	 {},
  institution =  {University of Cambridge, Computer Laboratory},
  number = 	 {UCAM-CL-TR-459}