Computer Laboratory Technical Reports

Computer Laboratory Technical Reports https://www.cl.cam.ac.uk/techreports/ Recent research reports published by the Computer Laboratory at the University of Cambridge. A ten-year review of the Cambridge Cybercrime Centre https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1003.pdf Pankow, Hannah Hutchings, Alice Clayton, Richard University of Cambridge, Computer Laboratory 2025-11 This report presents a comprehensive ten-year review of the Cambridge Cybercrime Centre (CCC), evaluating its impact on the academic research landscape. Employing a mixed-methods approach, we combine a scoping review of 203 publications with a survey of 44 dataset users and 10 in-depth interview sessions with 12 individuals, including the Centre’s leadership. The findings reveal a consistent and accelerating uptake of the Centre’s datasets, with 131 of the reviewed papers making explicit use of the data. This growth is driven by a broad, international, and interdisciplinary community. We find the Centre’s primary contribution is the enablement of new research, particularly for postgraduate students and early-career researchers who gain access to large-scale data that would otherwise be infeasible to collect. While technical setup difficulties present a barrier for some users, the development of the PostCog analysis tool has been a critical success, serving as both an accessibility and efficiency tool for the community. However, its impact is currently limited by an awareness gap. We conclude that the Centre’s success lies in its evolution from a data provider to a holistic research ecosystem. Its future relevance will depend on its ability to adapt its data collection to evolving online harms and to ensure the long-term viability of its critical infrastructure. Attestable Builds: compiling verifiable binaries on untrusted systems using trusted execution environments https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1002.pdf Hugenroth, Daniel Lins, Mario Mayrhofer, René Beresford, Alastair R. University of Cambridge, Computer Laboratory 2025-10 In this paper we present attestable builds, a new paradigm to provide strong source-to-binary correspondence in software artifacts. We tackle the challenge of opaque build pipelines that disconnect the trust between source code, which can be understood and audited, and the final binary artifact which is difficult to inspect. Our system uses modern trusted execution environments (TEEs) and sandboxed build containers to provide strong guarantees that a given artifact was correctly built from a specific source code snapshot. As such it complements existing approaches like reproducible builds which typically require time-intensive modifications to existing build configurations and dependencies, and require independent parties to continuously build and verify artifacts. In comparison, an attestable build requires only minimal changes to an existing project, and offers nearly instantaneous verification of the correspondence between a given binary and the source code and build pipeline used to construct it. We evaluate it by building open-source software libraries—focusing on projects which are important to the trust chain and have proven difficult to be built deterministically. The overhead (42 seconds start-up latency and 14% increase in build duration) is small in comparison to the overall build time. Importantly, our prototype can build complex projects such as LLVM Clang without requiring any modifications to their source code and build scripts. Finally, we formally model and verify the attestable build design to demonstrate its security against well-resourced adversaries. Toward transient-execution attack mitigations on CHERI https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1001.pdf Fuchs, Franz A. University of Cambridge, Computer Laboratory 2025-08 This thesis explores how to protect Capability Hardware Enhanced RISC Instructions (CHERI) systems from transient-execution attacks. Transient-execution attacks shocked the computing world as they allow security mechanisms to be circumvented via seemingly safe performance-enhancing mechanisms. These attacks use misguided speculation to access secrets and transmit them via a side channel. Since the initial discovery of this attack class, every year saw fresh attacks being discovered with a lack of mitigation mechanisms. CHERI defines architectural capabilities that help to tackle spatial and temporal memory safety issues. However, the CHERI ISA has not been designed with transient-execution vulnerabilities in mind. In order to satisfy performance requirements, CHERI implementations employ out-of-order and speculative execution mechanisms. The lack of ISA-level guarantees leads to multiple attack scenarios on conventional and CHERI systems. This thesis demonstrates a full-scale attack on CHERI-Toooba that manages to break CHERI’s security guarantees in speculation and reliably leaks a secret value. Motivated by these findings, I developed ISA-level contracts that restrict speculative execution for both conventional and CHERI systems. As a major contribution, these contracts close a gap in architectures by giving guarantees about speculation mechanisms, which allows secure software to be built atop these contracts. I evaluated my contracts on CHERI-Toooba, which is an out-of-order, superscalar implementation of CHERI-RISC-V. My contracts offer substantial security guarantees and can surprisingly lead to improvements in both cycle performance as well as area usage on FPGAs. Making fine-grained compartmentalisation robust against transient-execution attacks is critical for the overall security of CHERI systems. In this work, I compare multiple different solutions and introduce Thread ID Capability (TIDC) registers as a fast and compact solution to compartmentalisation on CHERI-RISC-V. In order to facilitate fast and secure transitions between compartments, this work finds that microarchitectures must not allow microarchitectural state to leak. This thesis suggests multiple approaches to separating microarchitectural state and evaluates their performance on multiple compartmentalisation models. Last, this thesis introduces the notion of Compartment ID (CID) sealing. In this work, I implement a necessary subset of this approach that allows out-of-order microarchitectures to keep track of current software-defined CIDs. I find this approach to have significant performance advantages over conventional speculation fences. Application identification in data centres: a traffic driven approach https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1000.pdf Popeanga, Mihnea-Stefan University of Cambridge, Computer Laboratory 2025-08 Modern data centre (DC) operators cannot tune and secure what they cannot see. However, application identification from network traces is held back by two obstacles: public packet captures are scarce because commercial workloads and user data are confidential; the few datasets that exist do not focus on DC specific workloads, and do not allow others to reproduce the experiments. This dissertation tackles both these issues. I designed and implemented an end-to-end framework that can systematically capture traffic with nanosecond timestamps, demultiplex flows, and compute a set of 203 features. Each flow is coupled with extensive metadata detailing the exact setup that generated the traffic, allowing any researcher to reproduce the experiments under identical conditions. Using this workflow, I created the first public DC-focused dataset, unencumbered with personal or confidential information, that spans three representative workloads. Machine learning classification techniques demonstrate the utility of the data: traditional feature-based models achieve perfect accuracy when identifying the three workloads. A core novelty is that besides strict identification, the collected data includes significant metadata. To demonstrate this, I tackled performance estimation as well for one of the workloads. A 1D CNN can distinguish between flows corresponding to different performance metrics with an accuracy of 95%. CoverDrop White Paper https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-999.pdf Hugenroth, Daniel Cutler, Sam Kendrick, Dominic Savarese, Mario Hunter-Green, Zeke McMahon, Philip Kalanaki, Marjan Vasile, Diana A. Bejasa-Dimmock, Sabina Hoyland, Luke Beresford, Alastair R. University of Cambridge, Computer Laboratory 2025-06 The free press fulfills an important function in a democracy. It can provide individuals with a mechanism through which they can hold powerful people and organizations to account. In previous work, the University of Cambridge developed CoverDrop: a set of extensions to a typical news app which provided a secure and usable method of establishing initial contact between journalists and sources. Since publication, The Guardian and the University of Cambridge have undertaken further work on the design, highlighting additional challenges and shortcomings which needed to be addressed before deployment. This white paper presents an updated design of the CoverDrop system which addresses these issues, and describes the version that The Guardian first deployed in April 2025. Modelling orchestration https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-998.pdf Jeffery, Andrew University of Cambridge, Computer Laboratory 2025-05 Modern cloud services operate at significant and increasing scale. The growth of these services has led to the need for automated management to keep them operational across many thousands of nodes and multiple geo-distributed sites. Orchestrators are the platforms designed to automate this management and standardise the workflows involved. The significant uptake of modern orchestrators means that they have expanded their scope out of private datacenters, into the public cloud, and now even towards the edge of the network. These are environments for which they are not designed, and while they share some characteristics with private datacenters, the differences are sufficiently significant to require rethinking the design of the orchestrators. In this dissertation, I examine orchestrator design, focusing on the global state they maintain in their central datastores. To do this I propose a definition of the orchestration problem and provide a lightweight formalisation using model checking. I use this model to explore the properties of an existing orchestrator, explaining observed failures arising from changes in the consistency model. I then explore the impact of variations to the consistency model of the global state on properties and performance of the model checking. Using insights from this model and its consistency analysis I then propose two new datastores to support the control-plane of orchestration platforms, for the public cloud and the near-edge. In the public cloud data confidentiality is paramount, trying to minimise the actors within the trust boundary to enable secure, trusted deployments. For the near-edge I focus on availability of a single cluster, enabling individual locations to process requests without reliance on persistent non-local communication. Together, these components, the model and the two datastores, enable orchestration platforms to be optimised for their environments, enabling more widespread use. CHERI-SIMT report: implementing capability memory protection in GPGPUs https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-997.pdf Naylor, Matthew Joannou, Alexandre Markettos, A. Theodore Metzger, Paul Moore, Simon W. Jones, Timothy M. University of Cambridge, Computer Laboratory 2025-03 Governments are increasingly advising software manufacturers to employ memory-safe languages and technologies to combat adversarial attacks on modern computing infrastructure. This introduces pressures across the entire computing industry, including GPGPU vendors who provide implementations of unsafe C/C++-based languages, such as CUDA and OpenCL, for programming the devices they produce. One of the memory-safety technologies being recommended is Capability Hardware Enhanced RISC Instructions (CHERI). CHERI builds strong and efficient memory safety into underlying instruction-set architectures allowing continued, but memory-safe, use of C/C++-based languages on top. Another option being recommended is Rust, a memory-safe systems programming language that can viably replace C/C++ in some cases. In this report, we evaluate the feasibility of incorporating CHERI into GPGPU architectures by extending a prototype, open-source, synthesisable, SIMT core and CUDA-like programming environment with support for CHERI. We present techniques to considerably ameliorate the costs of CHERI in SIMT designs, reducing register-file storage overheads from 103% to 7%, logic-area overheads by 44% to a cost comparable to one additional multiplier per vector lane, and execution-time overheads to 1.6%. By comparison, an experimental Rust port of the same GPGPU benchmark suite shows a 34% increase in execution time due to software bounds checking. With the proposed techniques, CHERI offers a viable path to strong and efficient GPGPU memory safety, while avoiding the need to replace established programming practices. It is time to standardize principles and practices for software memory safety (extended version) https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-996.pdf Watson, Robert N. M. Baldwin, John Chen, Tony Chisnall, David Clarke, Jessica Davis, Brooks Filardo, Nathaniel Wesley Gutstein, Brett Jenkinson, Graeme Kern, Christoph Laurie, Ben Mazzinghi, Alfredo Moore, Simon W. Neumann, Peter G. Okhravi, Hamed Rebert, Alex Richardson, Alex Sewell, Peter Tratt, Laurence Vijayaraghavan, Murali Vincent, Hugo Witaszczyk, Konrad University of Cambridge, Computer Laboratory 2025-02 This is the extended version of the paper, “It is time to standardize principles and practices for software memory safety”, which appeared in the February 2025 issue of Communications of the ACM. In this report, we explore memory-safety standardization, which we argue is an essential step to promoting universal strong memory safety in government and industry, and, in turn, to ensure access to more secure software for all. Over the last two decades, a set of four research technologies for strong memory safety – memory-safe systems languages, hardware and software memory protection, formal approaches, and software compartmentalization – have reached sufficient maturity to see early deployment in security-critical use cases. However, there remains no shared, technology-neutral terminology or framework with which to specify memory-safety requirements. This is needed to enable reliable specification, design, implementation, auditing, and procurement of strongly memory-safe systems. Failure to speak in a common language makes it difficult to understand the possibilities or communicate accurately with one another, limiting perceived benefits and hence actual demand. The lack of such a framework also acts as an impediment to potential future policy interventions, and, in turn, as an impediment to stating requirements to address observed market failures preventing adoption of these technologies. Standardization would also play a critical role in improving industrial best practice, another key aspect of adoption. We begin with an overview of the many techniques – from hardware to software to formal theories – that have been developed and redefined over several decades, and how each plays a part in moving us towards strong memory safety. We explore how these technologies can be differentiated, considering both differences in functional protection and strength. We discuss how adoption barriers and potential market failures have limited adoption, and how the standardization gap limits potential interventions. We propose potential approaches to standardization – likely a task not limited to any one institution or standards body – and conclude with an illustrative universal memory-safety adoption timeline proposing a realistic path to universal adoption given suitable incentivization. Fragment-template power-analysis attacks against microcontroller implementations of the 32-bit stream cipher ChaCha https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-995.pdf Batchelor, Henry University of Cambridge, Computer Laboratory 2024-07 ChaCha is a widely adopted stream cipher, used for both random number generation and encryption. I propose a factor graph of ChaCha to improve the success rate of side-channel attacks that provide leakages throughout the entire execution of the algorithm. I also assess (fragment) template attacks against several implementations of ChaCha to demonstrate that the factor graph is helpful when working with actual side-channel attacks. These attacks could fully recover the correct key from an 8-bit implementation. In contrast, a 32-bit implementation, with most of the state held in registers, was significantly more challenging to attack. An adversary with access to 10 power traces and an incremented counter could achieve a success rate of 14.6%. For a 32-bit implementation, with lots of SRAM activity, an attacker could successfully recover the key in 2.6% of cases from a single trace. Deception and defense from machine learning to supply chains https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-994.pdf Boucher, Nicholas University of Cambridge, Computer Laboratory 2024-05 Broad classes of modern cyberattacks are dependent upon their ability to deceive human victims. Given the ubiquity of text across modern computational systems, we present and analyze a set of techniques that attack the encoding of text to produce deceptive inputs to critical systems. By targeting a core building block of modern systems, we can adversarially manipulate dependent applications ranging from natural language processing pipelines to search engines to code compilers. Left undefended, these vulnerabilities enable many ill effects including uncurtailed online hate speech, disinformation campaigns, and software supply chain attacks. We begin by generating adversarial examples for text-based machine learning systems. Due to the discrete nature of text, adversarial examples for text pipelines have traditionally involved conspicuous perturbations compared to the subtle changes of the more continuous visual and auditory domains. Instead, we propose imperceptible perturbations: techniques that manipulate text encodings without affecting the text in its rendered form. We use these techniques to craft the first set of adversarial examples for text-based machine learning systems that are human-indistinguishable from their unperturbed form, and demonstrate their efficacy against systems ranging from machine translation to toxic content detection. We also describe a set of defenses against these techniques. Next, we propose a new attack setting which we call adversarial search. In this setting, an adversary seeks to manipulate the results of search engines to surface certain results only and consistently when a hidden trigger is detected. We accomplish this by applying the encoding techniques of imperceptible perturbations to both indexed content and queries in major search engines. We demonstrate that imperceptibly encoded triggers can be used to manipulate the results of current commercial search engines, and then describe a social engineering attack exploiting this vulnerability that can be used to power disinformation campaigns. Again, we describe a set of defenses against these techniques. We then look to compilers and propose a different set of text perturbations which can be used to craft deceptive source code. We exploit the bidirectional nature of modern text standards to embed directionality control characters into comments and string literals. These control characters allow attackers to shuffle the sequence of tokens rendered in source code, and in doing so to implement programs that appear to do one thing when rendered to human code reviewers, but to do something different from the perspective of the compiler. We dub this technique the Trojan Source attack, and demonstrate the vulnerability of C, C++, C#, JavaScript, Java, Rust, Go, Python, SQL, Bash, Assembly, and Solidity. We also explore the applicability of this attack technique to launching supply chain attacks, and propose defenses that can be used to mitigate this risk. We also describe and analyze a 99-day coordinated disclosure that yielded patches to dozens of market-leading compilers, code editors, and code repositories. Finally, we propose a novel method of identifying software supply chain attacks that works not only for Trojan Source attacks, but for most forms of supply chain attacks. We describe an extension to compilers dubbed the Automated Bill of Materials, or ABOM, which embeds dependency metadata into compiled binaries. Specifically, hashes of each source code file consumed by a compiler are embedded into its emitted binary, and these hashes are included recursively into all downstream dependencies. They are stored in a highly space and time efficient probabilistic data structure that requires an expected value of just 2.1 bytes to represent each unique dependency source code file. With ABOMs, it becomes possible to detect all naturally occurring and most adversarially induced vulnerabilities used for supply chain attacks in downstream software by querying binaries for the presence of poisoned dependencies without the need to locate tangible indicators of compromise. In this thesis, we therefore demonstrate how weaknesses in a core building block of modern systems – text encodings – can cause failures in a wide range of domains including machine learning, search engines, and source code. We propose defenses against each variant of our attack, including a new tool to identify most generic software supply chain attacks. We believe that these techniques will be useful in securing software ecosystems against the next generation of attacks. Distributional and relational inductive biases for graph representation learning in biomedicine https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-993.pdf Scherer, Paul M. University of Cambridge, Computer Laboratory 2024-04 The immense complexity in which DNAs, RNAs, proteins and other biomolecules interact amongst themselves, with one another, and the environment to bring about life processes motivates the mass collection of biomolecular data and data-driven modelling to gain insights into physiological phenomena. Recent predictive modelling efforts have focused on deep representation learning methods which offer a flexible modelling paradigm to handling high dimensional data at scale and incorporating inductive biases. The emerging field of representation learning on graph structured data opens opportunities to leverage the abundance of structured biomedical knowledge and data to improve model performance. Grand international initiatives have been coordinated to organise and structure our growing knowledge about the interactions and putative functions of biomolecular entities using graphs and networks. This dissertation considers how we may use the inductive biases within recent graph representation learning methods to leverage these structures and incorporate biologically relevant relational priors into machine learning methods for biomedicine. We present contributions in two parts with the aim to foster research in this multidisciplinary domain and present novel methods that achieve strong performance through the use of distributional and relational inductive biases operating on graph-structured biomedical knowledge and data. The first part is concerned with consolidating and expanding the current ecosystem of practical frameworks dedicated to graph representation learning. Our first contribution presents Geo2DR, the first practical framework and software library for constructing methods capable of learning distributed representations of graphs. Our second contribution, Pytorch Geometric Temporal, is the first open source representation learning library for dynamic graphs, expanding the scope of research software on graph neural networks that were previously limited to static graphs. The second part presents three methods wherein each contribution tackles an active biomedical research problem using relational structures that exist within different aspects of the data. First, we present a methodology for learning distributed representations of molecular graphs in the context of drug pair scoring. Next, we present a method for leveraging structured knowledge on the variables of gene expression profiles to automatically construct sparse neural models for cancer subtyping. Finally, we present a state-of-the-art cell deconvolution model for spatial transcriptomics data using the positional relationships between observations in the dataset. Transient execution vulnerabilities in the security context of server hardware https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-992.pdf Randal, Allison University of Cambridge, Computer Laboratory 2023-12 Many mitigations have been proposed and implemented for many variants of the transient execution vulnerabilities, and while the Meltdown-type exception-based transient execution vulnerabilities have proven to be tractable, Spectre-type vulnerabilities and other speculation-based transient execution vulnerabilities have been far more resistant to countermeasures. For smaller-scale embedded systems or security-focused hardware such as a cryptographic system or a root-of-trust (RoT), eliminating speculation is widely accepted as a reasonable approach to improving security. But, for larger-scale and general-purpose hardware, eliminating speculation is often dismissed as inconceivable, though the claim that speculation is required for adequate performance is rarely supported by concrete performance results. The performance results we do have from several independent strands of research over the past few decades have shown that speculation features on large-scale server hardware do not offer the same performance advantages as on smaller-scale hardware, so eliminating speculation on large-scale server hardware does not harm performance as much as we might suspect. And selective speculation techniques have shown that speculation-based transient execution vulnerabilities can be mitigated by a partial elimination of speculation, so we can preserve some of the performance of speculation while subduing the security risk. In order to demonstrate that eliminating speculation is a feasible approach to mitigating the transient execution vulnerabilities on large-scale server hardware, this work considers three alternative approaches that partially or completely eliminate speculative execution: heterogeneous multicore systems combining speculative and non-speculative cores; entirely non-speculative microarchitectures; and selective speculation microarchitectures. Porting a mix network client to mobile https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-991.pdf Kung, Jacky W. E. University of Cambridge, Computer Laboratory 2023-12 This project set out to investigate the feasibility of mix network clients on the mobile ecosystem. It considers the Android operating system, and Nym, a production-grade mix network infrastructure based on the abstract Loopix architecture first presented in 2017. The goal of the project was to produce a minimal working prototype, and present an evaluation of the trade-offs necessary for an efficient implementation in the Android ecosystem. Nym’s client codebase written in Rust has been successfully ported over to Android after adjusting parts of the code and constructing the compilation toolchain. An exploration of the performance effects of compilation parameters and mixnet parameters is presented. Two extension tasks were completed: a semi-automated compilation pipeline, and further evaluation using measurements taken using the custom hardware provided by my supervisor. The repository also contains, as a side-product, a Rust library that provides a friendly interface between code that runs across the Rust and Kotlin languages. Probing the foundations of neural algorithmic reasoning https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-990.pdf Ong, Euan University of Cambridge, Computer Laboratory 2023-12 While the field of neural algorithmic reasoning (NAR) — training neural networks to imitate algorithms and using them as algorithmic inductive biases in real-world problems — has risen in popularity, there has been no investigation confirming that its fundamental claims hold in general. Indeed, we argue that such an investigation has so far been infeasible, due to the lack of a general extensible library creating a very high barrier to entry for reproductions and systematic studies. As such, we develop an extensible laboratory for NAR, by introducing a novel framework for multi-domain, type-driven, declarative ML, and using its components to derive flexible NAR pipelines from first principles through the paradigm of representations-as-types. We use this laboratory to perform systematic analyses, reproductions and comparisons of prior work in NAR, matching (and often beating) state-of-the-art performance across various domains by identifying and alleviating bottlenecks across popular NAR frameworks and architectures. We then conduct a systematic investigation into the fundamental claims of NAR, in the context of a new synthetic dataset inspired by recent work in neural algorithmics. Through a series of statistically-robust ablation tests, while we confirm the established result that algorithmic modules beat non-algorithmic baselines, we find evidence to refute one of the central claims of NAR, showing that neural algorithmic processors (NAPs) do not overcome the ‘scalar bottleneck’ of differentiable algorithmic black-boxes (sDABs). Based on our observations, we develop a new hypothesis to replace this claim: that sDABs instead suffer from an ‘ensembling bottleneck’ of not being able to execute multiple instances of the same algorithm in parallel, which is alleviated not by NAPs, but by simply using an unfrozen, structurally-aligned neural network. And, through exploring the effects of parallelising sDABs, we not only find strong evidence in support of this hypothesis, but also achieve a long-standing goal of neural algorithmics: developing a way to deterministically distill an algorithm into a robust, high-dimensional processor network that preserves both the efficiency and correctness guarantees of sDABs while avoiding their performance bottleneck. Balanced allocations under incomplete information: New settings and techniques https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-989.pdf Los, Dimitrios University of Cambridge, Computer Laboratory 2023-11 In the balanced allocations framework, there are 𝑚 balls to be allocated into 𝑛 bins with the aim of minimising the maximum load of any of the bins, or equivalently minimising the “gap”, i.e., the difference between the maximum load and the average load. In this dissertation, we focus on the “heavily-loaded case” where 𝑚 ≫ 𝑛, which tends to be more challenging to analyse. In a decentralised setting, the simplest process is One-Choice, which allocates each ball to a bin sampled uniformly at random. It is well-known that w.h.p. Gap(𝑚) = Θ( sqrt( 𝑚/𝑛 · log 𝑛 ) ) for any 𝑚 ≫ 𝑛. A great improvement over this is the Two-Choice process [ABKU99, KLM96], which allocates each ball to the least loaded of two bins sampled uniformly at random. Berenbrink, Czumaj, Steger, and Vöcking (2006) showed that w.h.p. Gap(𝑚) = log₂ log 𝑛 + Θ(1) for any 𝑚 ≥ 𝑛. This improvement is known as the “power of two choices”. It has found several applications in hashing, load balancing and routing; and its importance was recently recognised in the 2020 ACM Theory and Practice Award. In this dissertation, we introduce a set of techniques based on “potential functions”. These enable us to analyse (both in terms of gap and load distribution) a wide range of processes and settings in the heavily-loaded case and to establish interesting insights in the balanced allocations framework: • We analyse variants of the Two-Choice process which trade sample efficiency, completeness of information and gap guarantees. For the (1+β)-process which mixes One-Choice and Two-Choice with probability β in (0, 1], we prove tight bounds for small and large β, extending the results of Peres, Talwar and Wieder (2015). Another sample efficient family is that of Two-Thinning processes, which allocate to the two sampled bins in an online manner. For Two-Thinning processes that use as a decision function thresholds relative to the average load or thresholds in the rank domain, we establish tight bounds and also resolve a conjecture by Feldheim and Gurel-Gurevich (2021). We also quantify trade-offs for two-sample processes between the number of queries and the gap bound, establishing a “power of two queries” phenomenon. • We analyse the Two-Choice process with random, adversarial and delay noise, proving tight bounds for various settings. In the adversarial setting, the adversary can decide in which of the two sampled bins the ball is allocated to, only when the two loads differ by at most 𝑔. The analysis of this setting implies bounds for settings with random noise and delay. For the setting where load information is updated periodically every 𝑏 steps, for 𝑏 = 𝑛 we tighten the bound of [BCEFN12] to Θ( log 𝑛 / log log 𝑛 ) and prove that Two-Choice is optimal in this setting for any in [𝑛 · exp(−logᶜ 𝑛), 𝑛 log 𝑛] for any constant 𝑐 > 0. For 𝑏 in [𝑛 log 𝑛, 𝑛³], we show that Two-Choice achieves w.h.p. a Θ(𝑏/𝑛) gap, while surprisingly the (1+β)-process with appropriately chosen β achieves w.h.p. a Θ( sqrt( 𝑏/𝑛 · log 𝑛) ) gap, which is optimal over a large family of processes. This proves that in the presence of outdated information, less aggressive strategies can outperform the greedy processes (such as Two-Choice), which has been empirically observed in the queuing setting [D00, M00] for centralised processes since 2000, but to the best of our knowledge has not been formally proven. • Next we analyse Two-Choice in the graphical setting, where bins are vertices of a graph and each ball is allocated to the lesser loaded of the vertices adjacent to a randomly sampled edge. We extend the results of Kenthapadi and Panigrahy (2006) proving that for dense expanders in the heavily-loaded case the gap is w.h.p. O(log log 𝑛). In the presence of weights, we make progress towards [Open Problem 1, PTW15] by proving that for graphs with conductance φ, the gap is w.h.p. Ο(log 𝑛 / φ). • Further, we introduce and analyse processes which can allocate more than one balls to a sampled bin. We prove that these processes achieve w.h.p. an O(log 𝑛) gap (which also applies for any 𝑑-regular graph), while still being more sample-efficient than One-Choice (“power of filling”). • For the Memory process that can store bins in a cache, we generalise the O(log log 𝑛) gap bound by Mitzenmacher, Prabhakar and Shah (2002) to the heavily-loaded case and prove a matching lower bound. Further, in the presence of heterogeneous sampling distributions, we establish a striking difference between Two-Choice (or even 𝑑-Choice with 𝑑 = O(1)) and Memory, showing that for the later the gap is bounded, while for the former it is known to diverge [W07] (“power of memory”).