Concurrency theory and real-world multiprocessors have developed in parallel for the last 50 years, from their beginnings in the mid 1960s. Both have been very productive: concurrency theory has given us a host of models, calculi, and proof techniques, while engineered multiprocessors are now ubiquitous, from 2-8 core smartphones and laptops through to servers with 1024 or more hardware threads. But the fields have scarcely communicated, and the shared-memory interaction primitives offered by those mainstream multiprocessors are very different from the theoretical models that have been heavily studied.
We rely on a computational infrastructure that is a densely interwined mass of software and hardware: programming languages, network protocols, operating systems, and processors. It has accumulated great complexity, from a combination of engineering design decisions, contingent historical choices, and sheer scale, yet it is defined at best by prose specifications, or, all too often, just by the common implementations. Can we do better? More specifically, can we apply rigorous methods to this mainstream infrastructure, taking the accumulated complexity seriously, and if we do, does it help? My colleagues and I have looked at these questions in several contexts: the TCP/IP network protocols with their Sockets API; programming language design, including the Java module system and the C11/C++11 concurrency model; the hardware concurrency behaviour of x86, IBM POWER, and ARM multiprocessors; and compilation of concurrent code.
In this talk I will draw some lessons from what did and did not succeed, looking especially at the empirical nature of some of the work, at the social process of engagement with the various different communities, and at the mathematical and software tools we used. Domain-specific modelling languages (based on functional programming ideas) and proof assistants were invaluable for working with the large and loose specifications involved: idioms within HOL4 for TCP, our Ott tool for programming language specification, and Owens's Lem tool for portable semantic definitions, with HOL4, Isabelle, and Coq, for the relaxed-memory concurrency semantics work. Our experience with these suggests something of what is needed to make full-scale rigorous semantics a commonplace reality.
This report documents the programme and the outcomes of Dagstuhl Seminar 11011 "Multi-Core Memory Models and Concurrency Theory".
Multiprocessors and high-level languages generally provide only relaxed (non-sequentially-consistent) memory models, to permit performance optimisations. One has to understand these models to program reliable concurrent systems -- but they are typically ambiguous and incomplete informal-prose documents, sometimes give guarantees that are too weak to be useful, and are sometimes simply unsound. Based on our previous work, we review various problems with some current specifications, for x86 (Intel 64/IA32 and AMD64), and Power and ARM processors, and for the Java and C++ languages. We argue that such specifications should be rigorously defined and tested.
Multiprocessors and high-level concurrent languages generally provide only relaxed (non-sequentially-consistent) memory models, to permit performance optimisations. One has to understand these models to program reliable concurrent systems but, despite work in this area over many years, the specifications of real-world multiprocessors and languages are typically ambiguous and incomplete informal-prose documents, cannot be used for testing hardware or software, sometimes give guarantees that are too weak to be useful, and are sometimes simply unsound. Such informal prose is a very poor medium for loose specifications. This talk will review various problems with some current specifications, for x86 (Intel 64/IA32 and AMD64), and Power and ARM processors, and for the Java and C++ languages, and describe ongoing work to produce rigorously defined specifications for some of these.