: Iris-Wasm: Robust and Modular Verification of WebAssembly Programs. Xiaojia Rao, Aïna Linn Georges, Maxime Legoupil, Conrad Watt, Jean Pichon-Pharabod, Philippa Gardner, and Lars Birkedal. Proc. ACM Program. Lang., 7(PLDI), June 2023. [ bib | doi | pdf | http ]
WebAssembly makes it possible to run C/C++ applications on the web with near-native performance. A WebAssembly program is expressed as a collection of higher-order ML-like modules, which are composed together through a system of explicit imports and exports using a host language, enabling a form of higher- order modular programming. We present Iris-Wasm, a mechanized higher-order separation logic building on a specification of Wasm 1.0 mechanized in Coq and the Iris framework. Using Iris-Wasm, we are able to specify and verify individual modules separately, and then compose them modularly in a simple host language featuring the core operations of the WebAssembly JavaScript Interface. Building on Iris-Wasm, we develop a logical relation that enforces robust safety: unknown, adversarial code can only affect other modules through the functions that they explicitly export. Together, the program logic and the logical relation allow us to formally verify functional correctness of WebAssembly programs, even when they invoke and are invoked by unknown code, thereby demonstrating that WebAssembly enforces strong isolation between modules.
: WasmRef-Isabelle: A Verified Monadic Interpreter and Industrial Fuzzing Oracle for WebAssembly. Conrad Watt, Maja Trela, Peter Lammich, and Florian Märkl. Proc. ACM Program. Lang., 7(PLDI), June 2023. [ bib | doi | pdf | http ]
We present WasmRef-Isabelle, a monadic interpreter for WebAssembly written in Isabelle/HOL and proven correct with respect to the WasmCert-Isabelle mechanisation of WebAssembly. WasmRef-Isabelle has been adopted and deployed as a fuzzing oracle in the continuous integration infrastructure of Wasmtime, a widely used WebAssembly implementation. Previous efforts to fuzz Wasmtime against WebAssembly's official OCaml reference interpreter were abandoned by Wasmtime's developers after the reference interpreter exhibited unacceptable performance characteristics, which its maintainers decided not to fix in order to preserve the interpreter's close definitional correspondence with the official specification. With WasmRef-Isabelle, we achieve the best of both worlds - an interpreter fast enough to be useable as a fuzzing oracle that also maintains a close correspondence with the specification through a mechanised proof of correctness. We verify the correctness of WasmRef-Isabelle through a two-step refinement proof in Isabelle/HOL. We demonstrate that WasmRef-Isabelle significantly outperforms the official reference interpreter, has performance comparable to a Rust debug build of the industry WebAssembly interpreter Wasmi, and competes with unverified oracles on fuzzing throughput when deployed in Wasmtime's fuzzing infrastructure. We also present several new extensions to WasmCert-Isabelle which enhance WasmRef-Isabelle's utility as a fuzzing oracle: we add support for a number of upcoming WebAssembly features, and fully mechanise the numeric semantics of WebAssembly's integer operations.
: MSWasm: Soundly Enforcing Memory-Safe Execution of Unsafe Code. Alexandra E. Michael, Anitha Gollamudi, Jay Bosamiya, Evan Johnson, Aidan Denlinger, Craig Disselkoen, Conrad Watt, Bryan Parno, Marco Patrignani, Marco Vassena, and Deian Stefan. Proc. ACM Program. Lang., 7(POPL):425--454, January 2023. [ bib | doi | pdf | http ]
Most programs compiled to WebAssembly (Wasm) today are written in unsafe languages like C and C++. Unfortunately, memory-unsafe C code remains unsafe when compiled to Wasm---and attackers can exploit buffer overflows and use-after-frees in Wasm almost as easily as they can on native platforms. Memory- Safe WebAssembly (MSWasm) proposes to extend Wasm with language-level memory-safety abstractions to precisely address this problem. In this paper, we build on the original MSWasm position paper to realize this vision. We give a precise and formal semantics of MSWasm, and prove that well-typed MSWasm programs are, by construction, robustly memory safe. To this end, we develop a novel, language-independent memory-safety property based on colored memory locations and pointers. This property also lets us reason about the security guarantees of a formal C-to-MSWasm compiler---and prove that it always produces memory-safe programs (and preserves the semantics of safe programs). We use these formal results to then guide several implementations: Two compilers of MSWasm to native code, and a C-to-MSWasm compiler (that extends Clang). Our MSWasm compilers support different enforcement mechanisms, allowing developers to make security-performance trade-offs according to their needs. Our evaluation shows that on the PolyBenchC suite, the overhead of enforcing memory safety in software ranges from 22% (enforcing spatial safety alone) to 198% (enforcing full memory safety), and 51.7% when using hardware memory capabilities for spatial safety and pointer integrity.
More importantly, MSWasm’s design makes it easy to swap between enforcement mechanisms; as fast (especially hardware-based) enforcement techniques become available, MSWasm will be able to take advantage of these advances almost for free.
: Isolation without taxation: near-zero-cost transitions for WebAssembly and SFI. Matthew Kolosick, Shravan Narayan, Evan Johnson, Conrad Watt, Michael LeMay, Deepak Garg, Ranjit Jhala, and Deian Stefan. Proc. ACM Program. Lang., 6(POPL):1--30, January 2022. [ bib | doi | pdf | http ]
Software sandboxing or software-based fault isolation (SFI) is a lightweight approach to building secure systems out of untrusted components. Mozilla, for example, uses SFI to harden the Firefox browser by sandboxing third-party libraries, and companies like Fastly and Cloudflare use SFI to safely co-locate untrusted tenants on their edge clouds. While there have been significant efforts to optimize and verify SFI enforcement, context switching in SFI systems remains largely unexplored: almost all SFI systems use heavyweight transitions that are not only error-prone but incur significant performance overhead from saving, clearing, and restoring registers when context switching. We identify a set of zero-cost conditions that characterize when sandboxed code has sufficient structured to guarantee security via lightweight zero-cost transitions (simple function calls). We modify the Lucet Wasm compiler and its runtime to use zero-cost transitions, eliminating the undue performance tax on systems that rely on Lucet for sandboxing (e.g., we speed up image and font rendering in Firefox by up to 29.7% and 10% respectively). To remove the Lucet compiler and its correct implementation of the Wasm specification from the trusted computing base, we (1) develop a static binary verifier, VeriZero, which (in seconds) checks that binaries produced by Lucet satisfy our zero-cost conditions, and (2) prove the soundness of VeriZero by developing a logical relation that captures when a compiled Wasm function is semantically well-behaved with respect to our zero-cost conditions. Finally, we show that our model is useful beyond Wasm by describing a new, purpose-built SFI system, SegmentZero32, that uses x86 segmentation and LLVM with mostly off-the-shelf passes to enforce our zero-cost conditions; our prototype performs on-par with the state-of-the-art Native Client SFI system.
: Mechanising and evolving the formal semantics of WebAssembly: the Web's new low-level language. Conrad Watt. PhD thesis, University of Cambridge, 2021. EAPLS Best Dissertation Award 2021, and Honorable Mention for the 2022 ACM Doctoral Dissertation Award. [ bib | doi | http ]
WebAssembly is the first new programming language to be supported natively by all major Web browsers since JavaScript. It is designed to be a natural low-level compilation target for languages such as C, C++, and Rust, enabling programs written in these languages to be compiled and executed efficiently on the Web. WebAssembly’s specification is managed by the W3C WebAssembly Working Group (made up of representatives from a number of major tech companies). Uniquely, the language is specified by way of a full pen-and-paper formal semantics.
This thesis describes a number of ways in which I have both helped to shape the specification of WebAssembly, and built upon it. By mechanising the WebAssembly formal semantics in Isabelle/HOL while it was being drafted, I discovered a number of errors in the specification, drove the adoption of official corrections, and provided the first type soundness proof for the corrected language. This thesis also details a verified type checker and interpreter, and a security type system extension for cryptography primitives, all of which have been mechanised as extensions of my initial WebAssembly mechanisation.
A major component of the thesis is my work on the specification of shared memory concurrency in Web languages: correcting and verifying properties of JavaScript’s existing relaxed memory model, and defining the WebAssembly-specific extensions to the corrected model which have been adopted as the basis of WebAssembly’s official threads specification. A number of deficiencies in the original JavaScript model are detailed. Some errors have been corrected, with the verified fixes officially adopted into subsequent editions of the language specification. However one discovered deficiency is fundamental to the model, an instance of the well-known "thin-air problem".
My work demonstrates the value of formalisation and mechanisation in industrial programming language design, not only in discovering and correcting specification errors, but also in building confidence both in the correctness of the language’s design and in the design of proposed extensions.
: Two Mechanisations of WebAssembly 1.0. Conrad Watt, Xiaojia Rao, Jean Pichon-Pharabod, Martin Bodin, and Philippa Gardner. In FM 2021. [ bib | doi | http ]
WebAssembly (Wasm) is a new bytecode language supported by all major Web browsers, designed primarily to be an efficient compilation target for low-level languages such as C/C++ and Rust. It is unusual in that it is officially specified through a formal semantics. An initial draft specification was published in 2017 with an associated mechanised specification in Isabelle/HOL published by Watt that found bugs in the original specification, fixed before its publication. The first official W3C standard, WebAssembly 1.0, was published in 2019. Building on Watt’s original mechanisation, we introduce two mechanised specifications of the WebAssembly 1.0 semantics, written in different theorem provers: WasmCert-Isabelle and WasmCert-Coq. Wasm’s compact design and official formal semantics enable our mechanisations to be particularly complete and close to the published language standard. We present a high-level description of the language’s updated type soundness result, referencing both mechanisations. We also describe the current state of the mechanisation of language features not previously supported: WasmCert-Isabelle includes a verified executable definition of the instantiation phase as part of an executable verified interpreter; WasmCert-Coq includes executable parsing and numeric definitions as on-going work towards a more ambitious end-to-end verified interpreter which does not require an OCaml harness like WasmCert-Isabelle.
: Repairing and mechanising the JavaScript relaxed memory model. Conrad Watt, Christopher Pulte, Anton Podkopaev, Guillaume Barbier, Stephen Dolan, Shaked Flur, Jean Pichon-Pharabod, and Shu-yu Guo. In PLDI 2020. [ bib | doi | http ]
Modern JavaScript includes the SharedArrayBuffer feature, which provides access to true shared memory concurrency. SharedArrayBuffers are simple linear buffers of bytes, and the JavaScript specification defines an axiomatic relaxed memory model to describe their behaviour. While this model is heavily based on the C/C++11 model, it diverges in some key areas. JavaScript chooses to give a well-defined semantics to data-races, unlike the "undefined behaviour" of C/C++11. Moreover, the JavaScript model is mixed-size. This means that its accesses are not to discrete locations, but to (possibly overlapping) ranges of bytes.
We show that the model, in violation of the design intention, does not support a compilation scheme to ARMv8 which is used in practice. We propose a correction, which also incorporates a previously proposed fix for a failure of the model to provide Sequential Consistency of Data-Race-Free programs (SC-DRF), an important correctness condition. We use model checking, in Alloy, to generate small counter-examples for these deficiencies, and investigate our correction. To accomplish this, we also develop a mixed-size extension to the existing ARMv8 axiomatic model.
Guided by our Alloy experimentation, we mechanise (in Coq) the JavaScript model (corrected and uncorrected), our ARMv8 model, and, for the corrected JavaScript model, a "model-internal" SC-DRF proof and a compilation scheme correctness proof to ARMv8. In addition, we investigate a non-mixed-size subset of the corrected JavaScript model, and give proofs of compilation correctness for this subset to x86-TSO, Power, RISC-V, ARMv7, and (again) ARMv8, via the Intermediate Memory Model (IMM).
As a result of our work, the JavaScript standards body (ECMA TC39) will include fixes for both issues in an upcoming edition of the specification.
: A Program Logic for First-Order Encapsulated WebAssembly. Conrad Watt, Petar Maksimovic, Neelakantan R. Krishnaswami, and Philippa Gardner. In ECOOP 2019. [ bib | doi | http ]
: CT-wasm: type-driven secure cryptography for the web ecosystem. Conrad Watt, John Renner, Natalie Popescu, Sunjay Cauligi, and Deian Stefan. Proc. ACM Program. Lang., 3(POPL):77:1--77:29, 2019. [ bib | doi | http ]
: Weakening WebAssembly. Conrad Watt, Andreas Rossberg, and Jean Pichon-Pharabod. Proc. ACM Program. Lang., 3(OOPSLA):133:1--133:28, 2019. [ bib | doi | http ]
: Mechanising and verifying the WebAssembly specification. Conrad Watt. In CPP 2018. [ bib | doi | http ]