We present WasmRef-Isabelle, a monadic interpreter for WebAssembly written in Isabelle/HOL and proven correct with respect to the WasmCert-Isabelle mechanisation of WebAssembly. WasmRef-Isabelle has been adopted and deployed as a fuzzing oracle in the continuous integration infrastructure of Wasmtime, a widely used WebAssembly implementation. Previous efforts to fuzz Wasmtime against WebAssembly's official OCaml reference interpreter were abandoned by Wasmtime's developers after the reference interpreter exhibited unacceptable performance characteristics, which its maintainers decided not to fix in order to preserve the interpreter's close definitional correspondence with the official specification. With WasmRef-Isabelle, we achieve the best of both worlds - an interpreter fast enough to be useable as a fuzzing oracle that also maintains a close correspondence with the specification through a mechanised proof of correctness. We verify the correctness of WasmRef-Isabelle through a two-step refinement proof in Isabelle/HOL. We demonstrate that WasmRef-Isabelle significantly outperforms the official reference interpreter, has performance comparable to a Rust debug build of the industry WebAssembly interpreter Wasmi, and competes with unverified oracles on fuzzing throughput when deployed in Wasmtime's fuzzing infrastructure. We also present several new extensions to WasmCert-Isabelle which enhance WasmRef-Isabelle's utility as a fuzzing oracle: we add support for a number of upcoming WebAssembly features, and fully mechanise the numeric semantics of WebAssembly's integer operations.
Most programs compiled to WebAssembly (Wasm) today are written in unsafe languages like C and C++. Unfortunately, memory-unsafe C code remains unsafe when compiled to Wasm---and attackers can exploit buffer overflows and use-after-frees in Wasm almost as easily as they can on native platforms. Memory- Safe WebAssembly (MSWasm) proposes to extend Wasm with language-level memory-safety abstractions to precisely address this problem. In this paper, we build on the original MSWasm position paper to realize this vision. We give a precise and formal semantics of MSWasm, and prove that well-typed MSWasm programs are, by construction, robustly memory safe. To this end, we develop a novel, language-independent memory-safety property based on colored memory locations and pointers. This property also lets us reason about the security guarantees of a formal C-to-MSWasm compiler---and prove that it always produces memory-safe programs (and preserves the semantics of safe programs). We use these formal results to then guide several implementations: Two compilers of MSWasm to native code, and a C-to-MSWasm compiler (that extends Clang). Our MSWasm compilers support different enforcement mechanisms, allowing developers to make security-performance trade-offs according to their needs. Our evaluation shows that on the PolyBenchC suite, the overhead of enforcing memory safety in software ranges from 22% (enforcing spatial safety alone) to 198% (enforcing full memory safety), and 51.7% when using hardware memory capabilities for spatial safety and pointer integrity.
More importantly, MSWasm’s design makes it easy to swap between enforcement mechanisms; as fast (especially hardware-based) enforcement techniques become available, MSWasm will be able to take advantage of these advances almost for free.
Software sandboxing or software-based fault isolation (SFI) is a lightweight approach to building secure systems out of untrusted components. Mozilla, for example, uses SFI to harden the Firefox browser by sandboxing third-party libraries, and companies like Fastly and Cloudflare use SFI to safely co-locate untrusted tenants on their edge clouds. While there have been significant efforts to optimize and verify SFI enforcement, context switching in SFI systems remains largely unexplored: almost all SFI systems use heavyweight transitions that are not only error-prone but incur significant performance overhead from saving, clearing, and restoring registers when context switching. We identify a set of zero-cost conditions that characterize when sandboxed code has sufficient structured to guarantee security via lightweight zero-cost transitions (simple function calls). We modify the Lucet Wasm compiler and its runtime to use zero-cost transitions, eliminating the undue performance tax on systems that rely on Lucet for sandboxing (e.g., we speed up image and font rendering in Firefox by up to 29.7% and 10% respectively). To remove the Lucet compiler and its correct implementation of the Wasm specification from the trusted computing base, we (1) develop a static binary verifier, VeriZero, which (in seconds) checks that binaries produced by Lucet satisfy our zero-cost conditions, and (2) prove the soundness of VeriZero by developing a logical relation that captures when a compiled Wasm function is semantically well-behaved with respect to our zero-cost conditions. Finally, we show that our model is useful beyond Wasm by describing a new, purpose-built SFI system, SegmentZero32, that uses x86 segmentation and LLVM with mostly off-the-shelf passes to enforce our zero-cost conditions; our prototype performs on-par with the state-of-the-art Native Client SFI system.
This thesis describes a number of ways in which I have both helped to shape the specification of WebAssembly, and built upon it. By mechanising the WebAssembly formal semantics in Isabelle/HOL while it was being drafted, I discovered a number of errors in the specification, drove the adoption of official corrections, and provided the first type soundness proof for the corrected language. This thesis also details a verified type checker and interpreter, and a security type system extension for cryptography primitives, all of which have been mechanised as extensions of my initial WebAssembly mechanisation.
My work demonstrates the value of formalisation and mechanisation in industrial programming language design, not only in discovering and correcting specification errors, but also in building confidence both in the correctness of the language’s design and in the design of proposed extensions.
WebAssembly (Wasm) is a new bytecode language supported by all major Web browsers, designed primarily to be an efficient compilation target for low-level languages such as C/C++ and Rust. It is unusual in that it is officially specified through a formal semantics. An initial draft specification was published in 2017 with an associated mechanised specification in Isabelle/HOL published by Watt that found bugs in the original specification, fixed before its publication. The first official W3C standard, WebAssembly 1.0, was published in 2019. Building on Watt’s original mechanisation, we introduce two mechanised specifications of the WebAssembly 1.0 semantics, written in different theorem provers: WasmCert-Isabelle and WasmCert-Coq. Wasm’s compact design and official formal semantics enable our mechanisations to be particularly complete and close to the published language standard. We present a high-level description of the language’s updated type soundness result, referencing both mechanisations. We also describe the current state of the mechanisation of language features not previously supported: WasmCert-Isabelle includes a verified executable definition of the instantiation phase as part of an executable verified interpreter; WasmCert-Coq includes executable parsing and numeric definitions as on-going work towards a more ambitious end-to-end verified interpreter which does not require an OCaml harness like WasmCert-Isabelle.
We show that the model, in violation of the design intention, does not support a compilation scheme to ARMv8 which is used in practice. We propose a correction, which also incorporates a previously proposed fix for a failure of the model to provide Sequential Consistency of Data-Race-Free programs (SC-DRF), an important correctness condition. We use model checking, in Alloy, to generate small counter-examples for these deficiencies, and investigate our correction. To accomplish this, we also develop a mixed-size extension to the existing ARMv8 axiomatic model.