: N3005: A Provenance-aware Memory Object Model for C. Working Draft Technical Specification ISO/IEC TS 6010:2023 (E), Jens Gustedt, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, and Martin Uecker. ISO/IEC JTC1/SC22/WG14 N3005 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n3005.pdf, June 2022. [ bib | pdf ]
In a committee discussion from 2004 concerning DR260, WG14 confirmed the concept of provenance of pointers, introduced as means to track and distinguish pointer values that represent storage instances with same address but non-overlapping lifetimes. Implementations started to use that concept, in optimisations relying on provenance-based alias analysis, without it ever being clearly or formally defined, and without it being integrated consistently with the rest of the C standard. This Technical Specification provides a solution for this: a provenance-aware memory object model for C to put C programmers and implementers on a solid footing in this regard.
: N2676: A Provenance-aware Memory Object Model for C. Working Draft Technical Specification, Jens Gustedt, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, and Martin Uecker. ISO/IEC JTC1/SC22/WG14 N2676 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2676.pdf, March 2021. [ bib | pdf ]
In a committee discussion from 2004 concerning DR260, WG14 confirmed the concept of provenance of pointers, introduced as means to track and distinguish pointer values that represent storage instances with same address but non-overlapping lifetimes. Implementations started to use that concept, in optimisations relying on provenance-based alias analysis, without it ever being clearly or formally defined, and without it being integrated consistently with the rest of the C standard. This Technical Specification provides a solution for this: a provenance-aware memory object model for C to put C programmers and implementers on a solid footing in this regard.
: N2577: A Provenance-aware Memory Object Model for C. Working Draft Technical Specification, Jens Gustedt, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, and Martin Uecker. ISO/IEC JTC1/SC22/WG14 N2577 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2577.pdf, September 2020. [ bib | pdf ]
In a committee discussion from 2004 concerning DR260, WG14 confirmed the concept of provenance of pointers, introduced as means to track and distinguish pointer values that represent storage instances with same address but non-overlapping lifetimes. Implementations started to use that concept, in optimisations relying on provenance-based alias analysis, without it ever being clearly or formally defined, and without it being integrated consistently with the rest of the C standard. This Technical Specification provides a solution for this: a provenance-aware memory object model for C to put C programmers and implementers on a solid footing in this regard.
: P1726R0: Pointer lifetime-end zap, Paul E. McKenney, Maged Michael, Jens Mauer, Peter Sewell, Martin Uecker, Hans Boehm, Hubert Tong, and Niall Douglas. ISO/IEC JTC1/SC22/WG21 P1726R0, June 2019. [ bib | pdf ]
: N2378: C provenance semantics: slides (extracts from N2363), Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, and Martin Uecker. ISO/IEC JTC1/SC22/WG14 N2378 v1, April 2019. [ bib | project page | pdf ]
: N2369: Pointer lifetime-end zap, Paul E. McKenney, Maged Michael, and Peter Sewell. ISO/IEC JTC1/SC22/WG14 N2369, April 2019. [ bib | project page | pdf ]
: N2364: C provenance semantics: detailed semantics (for PNVI-plain, PNVI address-exposed, PNVI address-exposed user-disambiguation, and PVI models), Peter Sewell, Kayvan Memarian, and Victor B. F. Gomes. ISO/IEC JTC1/SC22/WG14 N2364, April 2019. [ bib | project page | pdf ]
: N2363: C provenance semantics: examples, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, and Martin Uecker. ISO/IEC JTC1/SC22/WG14 N2363, April 2019. [ bib | project page | pdf ]
: N2362: Moving to a provenance-aware memory object model for C, Jens Gustedt, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, and Martin Uecker. ISO/IEC JTC1/SC22/WG14 N2362 v1, March 2019. [ bib | project page | pdf ]
: Exploring C Semantics and Pointer Provenance. Kayvan Memarian, Victor B. F. Gomes, Brooks Davis, Stephen Kell, Alexander Richardson, Robert N. M. Watson, and Peter Sewell. In POPL 2019, Proc. ACM Program. Lang. 3, POPL, Article 67. Also available as ISO/IEC JTC1/SC22/WG14 N2311. [ bib | doi | supplementary material | project page | pdf ]
The semantics of pointers and memory objects in C has been a vexed question for many years. C values cannot be treated as either purely abstract or purely concrete entities: the language exposes their representations, but compiler optimisations rely on analyses that reason about provenance and initialisation status, not just runtime representations. The ISO WG14 standard leaves much of this unclear, and in some respects differs with de facto standard usage --- which itself is difficult to investigate.
In this paper we explore the possible source-language semantics for memory objects and pointers, in ISO C and in C as it is used and implemented in practice, focussing especially on pointer provenance. We aim to, as far as possible, reconcile the ISO C standard, mainstream compiler behaviour, and the semantics relied on by the corpus of existing C code. We present two coherent proposals, tracking provenance via integers and not; both address many design questions. We highlight some pros and cons and open questions, and illustrate the discussion with a library of test cases. We make our semantics executable as a test oracle, integrating it with the Cerberus semantics for much of the rest of C, which we have made substantially more complete and robust, and equipped with a web-interface GUI. This allows us to experimentally assess our proposals on those test cases. To assess their viability with respect to larger bodies of C code, we analyse the changes required and the resulting behaviour for a port of FreeBSD to CHERI, a research architecture supporting hardware capabilities, which (roughly speaking) traps on the memory safety violations which our proposals deem undefined behaviour. We also develop a new runtime instrumentation tool to detect possible provenance violations in normal C code, and apply it to some of the SPEC benchmarks. We compare our proposal with a source-language variant of the twin-allocation LLVM semantics proposal of Lee et al. Finally, we describe ongoing interactions with WG14, exploring how our proposals could be incorporated into the ISO standard.
: N2263: Clarifying Pointer Provenance (Q1-Q20) v4 , Kayvan Memarian, Victor Gomes, and Peter Sewell. ISO SC22 WG14 N2263, March 2018. [ bib | project page | html ]
: N2223: Clarifying the C Memory Object Model: Introduction to N2219 -- N2222, Kayvan Memarian, Victor Gomes, and Peter Sewell. ISO SC22 WG14 N2223, March 2018. [ bib | project page | html ]
: N2222: Further Pointer Issues (Q21-Q46), Kayvan Memarian, Victor Gomes, and Peter Sewell. ISO SC22 WG14 N2222, March 2018. [ bib | project page | html ]
: N2221: Clarifying Unspecified Values (Q48-Q59) v3, Kayvan Memarian, Victor Gomes, and Peter Sewell. ISO SC22 WG14 N2221, March 2018. [ bib | project page | html ]
: N2220: Clarifying Trap Representations (Q47) v3 , Kayvan Memarian, Victor Gomes, and Peter Sewell. ISO SC22 WG14 N2220, March 2018. [ bib | project page | html ]
: N2219: Clarifying Pointer Provenance (Q1-Q20) v3 , Kayvan Memarian, Victor Gomes, and Peter Sewell. ISO SC22 WG14 N2219, March 2018. [ bib | project page | html ]
: N2091: Clarifying Trap Representations, Kayvan Memarian and Peter Sewell. ISO SC22 WG14 N2091, September 2016. [ bib | project page | html ]
: N2090: Clarifying Pointer Provenance, Kayvan Memarian and Peter Sewell. ISO SC22 WG14 N2090, September 2016. [ bib | project page | html ]
: N2089: Clarifying Unspecified Values, Kayvan Memarian and Peter Sewell. ISO SC22 WG14 N2089, September 2016. [ bib | project page | html ]
: Into the depths of C: elaborating the de facto standards. Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert N.M. Watson, and Peter Sewell. In PLDI 2016, PLDI 2016 Distinguished Paper award. [ bib | doi | project page | pdf | http ]
C remains central to our computing infrastructure. It is notionally defined by ISO standards, but in reality the properties of C assumed by systems code and those implemented by compilers have diverged, both from the ISO standards and from each other, and none of these are clearly understood.
We make two contributions to help improve this error-prone situation. First, we describe an in-depth analysis of the design space for the semantics of pointers and memory in C as it is used in practice. We articulate many specific questions, build a suite of semantic test cases, gather experimental data from multiple implementations, and survey what C experts believe about the de facto standards. We identify questions where there is a consensus (either following ISO or differing) and where there are conflicts. We apply all this to an experimental C implemented above capability hardware. Second, we describe a formal model, Cerberus, for large parts of C. Cerberus is parameterised on its memory model; it is linkable either with a candidate de facto memory object model, under construction, or with an operational C11 concurrency model; it is defined by elaboration to a much simpler Core language for accessibility, and it is executable as a test oracle on small examples.
This should provide a solid basis for discussion of what mainstream C is now: what programmers and analysis tools can assume and what compilers aim to implement. Ultimately we hope it will be a step towards clear, consistent, and accepted semantics for the various use-cases of C.
: N2015: What is C in practice? (Cerberus survey v2): Analysis of Responses -- with Comments, Kayvan Memarian and Peter Sewell. ISO SC22 WG14 N2015, March 2016. [ bib | project page | txt ]
: N2014: What is C in practice? (Cerberus survey v2): Analysis of Responses, Kayvan Memarian and Peter Sewell. ISO SC22 WG14 N2014, March 2016. [ bib | project page | html ]
: N2013: C memory object and value semantics: the space of de facto and ISO standards, David Chisnall, Justus Matthiesen, Kayvan Memarian, Kyndylan Nienhuis, Peter Sewell, and Robert N. M. Watson. ISO SC22 WG14 N2013, March 2016. [ bib | project page | pdf ]
: N2012: Clarifying the C memory object model, Kayvan Memarian and Peter Sewell. ISO SC22 WG14 N2012, March 2016. [ bib | project page | html ]
: P1797R0: C/C++ Memory Object Model Papers -- Introduction, Peter Sewell. ISO/IEC JTC1/SC22/WG21 P1797R0. [ bib | html ]
: P1796R0: Effective Types: Examples, Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, and Hubert Tong. ISO/IEC JTC1/SC22/WG21 P1796R0. [ bib | pdf ]