The C Memory Quiz (DRAFT, Version 0.2)

This survey explores the relationship between the C language as implemented and used in practice and the C standard. It consists of about 40 questions about the behaviour of memory, pointers, and values. For each, we're interested in answers from several different perspectives:
  • the language that systems programmers believe they are writing in, i.e., the assumptions they make about what behaviour they can rely on;
  • the idioms used in the corpus of mainstream systems code out there, especially in specific large-scale systems (Linux, FreeBSD, Xen, Apache, etc.);
  • the languages implemented by mainstream compilers (GCC, Clang, ICC, MSVC, etc.), including the assumptions their optimisation passes make about user code and how these change with certain flags (e.g. GCC's -fno-strict-aliasing and -fno-strict-overflow);
  • the C standard (C11 or earlier) (which might give a clear answer, be unclear, be contradictory, or not address each issue);
  • the issues that arise in making C code portable between different compilers and architectures;
  • the behaviour assumed by code analysis tools; and
  • the impact on formal semantics.
If you can speak to any of these in particular, or can give real-world examples where the questions are (or must not be) relied on, please do so in the `comments' boxes below. We are especially interested in the differences between the C language as it is commonly used and the language as specified by the standard, in differing interpretations of the standard, in precisely characterising the boundary between defined and undefined behaviour, and in the ways that compilers exploit undefined behaviour in optimisations.

Recall that if a program is deemed by the standard to have undefined behaviour, then the standard imposes no requirements at all on how an implementation can treat that program. When we ask whether an idiom is free of undefined behaviour with respect to some implementation (e.g. GCC or Clang), the question is really whether that implementation assumes (for optimisation) that legal programs do not use that idiom and so may give surprising results for any programs that do. Say a `usable pointer' is one that can be written to or read from without causing undefined behaviour.

For the questions about the standard, we're most interested in what you believe it says. If you're familiar with the standard and able to justify your answers with reference to it, that would also be interesting, but it's not necessary.

We illustrate each question with an example program. We've made these as simple as possible, but to see interesting implementation behaviour one might need more complex examples, with a bigger context, to give analysis and optimisation passes something to work on. Please bear that in mind when answering the questions, rather than focussing exclusively on how the code as written would be compiled.

Contents

NB: the `SUBMIT' button is right at the bottom of the form; your input will be lost unless you push it (you can submit multiple responses, if you want to do the survey in parts - if you do, please write your name and email exactly the same way in each, and avoid non-identical overlapping responses to any other questions).

Preamble







Pointer abstract values and pointer representations

C gives access both to abstract values and to their concrete byte representations. For most types, the abstract value is determined by the concrete representation (though there might be multiple representations for the same abstract value, and some representations might not denote any abstract value). For pointer values, however, it is arguable that the provenance of a pointer should be significant in determining what behaviour is undefined, how pointers compare, and what effect reads and writes using them have, whether or not that provenance is recorded in implementation runtime data. The questions in this section probe various aspects of this. Some are based on examples from Defect Report 260.

























Pointer Comparison

In this section we look at when pointers can be compared.










Casting of pointers to and from integers

This section explores when one can compute on a pointer by casting it to an integer type and back.










Casting of pointers: roundtrip properties

This question asks how generally one can cast a pointer to other pointer types and then back to the original.




Subobject Casts

These questions explore the extent to which one can cast a pointer to a subobject (a struct or union member or array element) into a usable pointer to the whole object, and whether one can move within a struct by address arithmetic.
















Compound Object Casts and Layout

This section explores the assumptions one can make about the layout of structurally similar types.










Objects and Malloc'd Regions

These questions examine how a dynamically allocated region of memory can be used.










Effective Types

Effective types are introduced in 6.5p6 of the standard, and 6.5p7 uses them to give some additional constraints on the accesses made by well-defined programs, apparently to enable type-based alias analysis. This question explores what the force of this is for objects in dynamically allocated regions of memory, asking exactly when such regions of memory acquire exactly what effective type constraints.




Representation Casts

The standard allows one to inspect and manipulate the representations of arbitrary types via char. These questions explore whether one can do this via other types.







Unspecified values and stability

A variable with automatic storage duration that is not explicitly initialised has `indeterminate value', which is either a `trap representation' or an `unspecified value'. These questions explore what one can assume about the behaviour of unspecified values.













Trap Representations

Trap representations do not represent values of the object type, and reading a trap representation (except by an lvalue of character type) is undefined behaviour. (Note that this does not mean that reading a trap representation must give rise to some hardware trap: trap representations might simply licence some compiler optimisation.)


Multiple Representations

A C implementation might have multiple distinct representations for the same abstract value. These questions explore what one can assume about this.











Representations







Any other issues?