ARMv8-A system semantics: instruction fetch in relaxed architectures

Ben Simner, Shaked Flur, Christopher Pulte, Alasdair Armstrong, Jean Pichon-Pharabod, Luc Maranget, Peter Sewell

Abstract

Computing relies on architecture specifications to decouple hardware and software development. Historically these have been prose documents, with all the problems that entails, but research over the last ten years has developed rigorous and executable-as-test-oracle specifications of mainstream architecture instruction sets and "user-mode" concurrency, clarifying architectures and bringing them into the scope of programming-language semantics and verification. However, the system semantics, of instruction-fetch and cache maintenance, exceptions and interrupts, and address translation, remains obscure, leaving us without a solid foundation for verification of security-critical systems software.

In this paper we establish a robust model for one aspect of system semantics: instruction fetch and cache maintenance for ARMv8-A. Systems code relies on executing instructions that were written by data writes, e.g. in program loading, dynamic linking, JIT compilation, debugging, and OS configuration, but hardware implementations are often highly optimised, e.g. with instruction caches, linefill buffers, out-of-order fetching, branch prediction, and instruction prefetching, which can affect programmer-observable behaviour. It is essential, both for programming and verification, to abstract from such microarchitectural details as much as possible, but no more. We explore the key architecture design questions with a series of examples, discussed in detail with senior Arm staff; capture the architectural intent in operational and axiomatic semantic models, extending previous work on "user-mode" concurrency; make these models executable as test oracles for small examples; and experimentally validate them against hardware behaviour (finding a bug in one hardware device). We thereby bring these subtle issues into the mathematical domain, clarifying the architecture and enabling future work on system software verification.

Links

The Draft Paper PDF is available online at https://www.cl.cam.ac.uk/~bs630/files/publications/2020-ESOP-ifetch-draft.pdf.

The full operational prose can be found at https://www.cl.cam.ac.uk/~bs630/files/iflat_full_operational.pdf.

The full axiomatic text can be found at https://www.cl.cam.ac.uk/~bs630/files/iflat_full_axiomatic.pdf.

The executable tool itself, RMEM, has a web interface available at https://www.cl.cam.ac.uk/~sf502/rmem-esop20/.

Its source is available at https://github.com/rems-project/rmem.

Executable Semantics for ARMv8-A

RMEM is our executable-as-a-test-oracle semantics for the ARMv8-A, RISC-V, IBM POWER, and x86 operational relaxed memory models.

Source

RMEM is open-source and can be found at https://github.com/rems-project/rmem.

There is a reasonably direct correspondence between the transitions in the prose description of the operational model and the transitions in the source.

For example, the "Fetch Request" transition is defined by the enumerate_init_fetch_transitions_of_instruction function in machineDefThreadSubsystem.lem. The prose described the set of valid next addresses as a guard on arbitrary addresses but to be executable the implementation insteads enumerates valid ones directly using potential_next_addresses_of_instruction_relaxed. These addresses are then mapped into new fetch request transitions (called T_init_fetch) whose action is described by a function which updates that thread's tree with a new entry.

All transitions described in the prose document correspond to some similar code that produces lists of transitions, Thread transitions (e.g. instruction transitions) are found in machineDefThreadSubsystem.lem and storage transitions (e.g. icache updates) are found in machineDefFlatStorageSubsystem.lem.

Example (Self-modifying)

Below is a step-by-step guide for interactively running the self-modifying example (SM) from the paper and its first few transitions:

Open the RMEM web interface, it should load to a page with with 4 blank boxes and a "Help" box in the bottom left.
Click the "Load Litmus" button at the top of the screen, a "Load litmus test" dialog should appear.
Click "Load from library..." and a "Load litmus test from library..." dialog should appear with 4 tabs: "Power", "ARMv8 (AArch64)", "RISC-V (Experimental)", and "x86 (Experimental)"
Select the "ARMv8 (AArch64)" tab.
On the left-hand side select the "Instruction Fetch (ESOP 20)" tab.
In the main tab pane, select "SM", this is our self-modifying test.
Now RMEM will load the SM test, but we still need to tell RMEM to use the relaxed fetching mode:
Click on the "Model" button at the top of the screen, select "Shallow embedding" under instruction semantics, and then check "Relaxed Fetch" under Out-of-Order mode, some new options should appear. These options default to the strongest. Select "Out-of-Order Fetching" and uncheck the "IDC=1" and "DIC=1" checkboxes to get the weakest semantics discussed in the paper. Now click "Ok" and the test should be loaded and ready to explore.

The top-right box now contains a diagrammatic view of the instruction tree for each thread and the shared memory (in the top-left). The top-left box shows the current state of the operational model's state machine. Available transitions are highlighted in blue.

In the top-right box, Click the blue highlighted "0: init fetch new instruction: 0x050000" link in Thread 0 to create a fetch request for Thread 0. Note that since we have not yet taken any icache update transitions we cannot satisfy the fetch (it would miss). A "0:1 UNFETCHED" box should appear in Thread 0's instruction tree, representing a request to fetch but not yet satisfied from the instruction cache.
In the top-left box, click the "1 icache update: 0x050000 (1000:0:0):W 0x050000/4=0xb8000c20" transition to populate the instruction cache of Thread 0 with an entry for that address, from the initial memory state.
Now Thread 0's "0:1 UNFETCHED" box should have updated, with a new transition "1 0:1 fetch instruction: 0x050000 0xb8000c20 from (1000:0:0):W 0x050000/4=0xb8000c20 |STR W0,[X1,#0]! (opcode: 0xb8000c20)|", click it and the fetch transition for that instruction will be taken and you will see the box update to "0:1 STR W0,[X1,#0]!" indicating the fetched instruction was a "STR"
Take the "1 0:1 decode: 0x050000 0xb8000c20 from (1000:0:0):W 0x050000/4=0xb8000c20 |STR W0,[X1,#0]! (opcode: 0xb8000c20)|" transition to decode the instruction and enable the execution transitions.

Errata (added 2023-09-11)

In the extended version, Appendix B.5 ("Model Transitions"), for the following transitions:

"Fetch instruction"
"Fetch instruction (unpredictable)"
"Fetch instruction (B.cond)"

Their text should read "can be satisfied from the instruction cache" rather than "from the memory and abstract data cache", and "Fetch instruction"'s condition should say "the write-slices ws have the 4-byte footprint of the entry and can be constructed from a write in the instruction cache".