C Concurrency

More details for the POPL 16 paper

Papers

[-] Jean Pichon-Pharabod and Peter Sewell. A concurrency semantics for relaxed atomics that permits optimisation and avoids thin-air executions. In POPL 2016: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, St. Petersburg, FL, USA, January 20 - 22, 2016, pages 622-633, 2016. [ bib | DOI | http | .pdf ]
Despite much research on concurrent programming languages, especially for Java and C/C++, we still do not have a satisfactory definition of their semantics, one that admits all common optimisations without also admitting undesired behaviour. Especially problematic are the “thin-air” examples involving high-performance concurrent accesses, such as C/C++11 relaxed atomics. The C/C++11 model is in a per-candidate-execution style, and previous work has identified a tension between that and the fact that compiler optimisations do not operate over single candidate executions in isolation; rather, they operate over syntactic representations that represent all executions.

In this paper we propose a novel approach that circumvents this difficulty. We define a concurrency semantics for a core calculus, including relaxed-atomic and non-atomic accesses, and locks, that admits a wide range of optimisation while still forbidding the classic thin-air examples. It also addresses other problems relating to undefined behaviour.

The basic idea is to use an event-structure representation of the current state of each thread, capturing all of its potential executions, and to permit interleaving of execution and transformation steps over that to reflect optimisation (possibly dynamic) of the code. These are combined with a non-multi-copy-atomic storage subsystem, to reflect common hardware behaviour.

The semantics is defined in a mechanised and executable form, and designed to be implementable above current relaxed hardware and strong enough to support the programming idioms that C/C++11 does for this fragment. It offers a potential way forward for concurrent programming language semantics, beyond the current C/C++11 and Java models.

[-] Mark Batty, Kayvan Memarian, Kyndylan Nienhuis, Jean Pichon-Pharabod, and Peter Sewell. The problem of programming language concurrency semantics. In ESOP 2015: Programming Languages and Systems - 24th European Symposium on Programming, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2015, London, UK, April 11-18, 2015., pages 283-307, 2015. [ bib | DOI | http | .pdf ]

Despite decades of research, we do not have a satisfactory concurrency semantics for any general-purpose programming language that aims to support concurrent systems code. The Java Memory Model has been shown to be unsound with respect to standard compiler optimisations, while the C/C++11 model is too weak, admitting undesirable thin-air executions. Our goal in this paper is to articulate this major open problem as clearly as is currently possible, showing how it arises from the combination of multiprocessor relaxed-memory behaviour and the desire to accommodate current compiler optimisations. We make several novel contributions that each shed some light on the problem, constraining the possible solutions and identifying new difficulties.

[-] Mark John Batty. The C11 and C++11 Concurrency Model. PhD thesis, University of Cambridge Computer Laboratory, 2014. [ bib ]

[-] Reinoud Elhorst, Mark Batty, and David Chisnall. Efficient code generation for weakly ordered architectures. Presentation at the 4th European LLVM conference (EuroLLVM), April 2014. Slides and report available at http://llvm.org/devmtg/2014-04/. [ bib ]

This report explores the way LLVM generates the memory barriers needed to support the C11/C++11 atomics for ARM. I measure the influence of memory barriers on performance, and I show that in some cases LLVM generates too many barriers. By leaving these barriers out, performance increases significantly. I introduce two LLVM passes, which will remove these extra barriers, improving performance in my test by 40%. I believe one of these passes is ready to be upstreamed to LLVM, while the other will need more testing.

[-] Reinoud Elhorst. Lowering C11 atomics for ARM in LLVM. [ bib | .pdf ]

This report explores the way LLVM generates the memory barriers needed to support the C11/C++11 atomics for ARM. I measure the influence of memory barriers on performance, and I show that in some cases LLVM generates too many barriers. By leaving these barriers out, performance increases significantly. I introduce two LLVM passes, which will remove these extra barriers, improving performance in my test by 40%. I believe one of these passes is ready to be upstreamed to LLVM, while the other will need more testing.

[-] Gabriel Kerneis, Charlie Shepherd, and Stefan Hajnoczi. QEMU/CPC: static analysis and CPS conversion for safe, portable, and efficient coroutines. In PEPM 2014: Proceedings of the ACM SIGPLAN 2014 workshop on Partial evaluation and program manipulation, January 20-21, 2014, San Diego, California, USA, pages 83-94, 2014. [ bib | DOI | http ]

Coroutines and events are two common abstractions for writing concurrent programs. Because coroutines are often more convenient, but events more portable and efficient, it is natural to want to translate the former into the latter. CPC is such a source-to-source translator for C programs, based on a partial conversion into continuation-passing style (CPS conversion) of functions annotated as cooperative.

In this article, we study the application of the CPC translator to QEMU, an open-source machine emulator which also uses annotated coroutine functions for concurrency. We first propose a new type of annotations to identify functions which never cooperate, and we introduce CoroCheck, a tool for the static analysis and inference of cooperation annotations. Then, we improve the CPC translator, defining CPS conversion as a calling convention for the C language, with support for indirect calls to CPS-converted function through function pointers. Finally, we apply CoroCheck and CPC to QEMU (750 000 lines of C code), fixing hundreds of missing annotations and comparing performance of the translated code with existing implementations of coroutines in QEMU.

Our work shows the importance of static annotation checking to prevent actual concurrency bugs, and demonstrates that CPS conversion is a flexible, portable, and efficient compilation technique, even for very large programs written in an imperative language.

This builds on our earlier work on Relaxed-Memory Concurrency, especially:

People

Funding

This work is funded by REMS: Rigorous Engineering for Mainstream Systems.