<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://purl.org/rss/1.0/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
>
  <channel rdf:about="https://www.cl.cam.ac.uk/techreports/">
    <title>Computer Laboratory Technical Reports</title>
    <link>https://www.cl.cam.ac.uk/techreports/</link>
    <description>Recent research reports published by the Computer Laboratory at the University of Cambridge.</description>
    <items>
      <rdf:Seq>
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1005.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1004.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1003.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1002.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1001.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1000.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-999.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-998.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-997.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-996.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-995.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-994.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-993.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-992.html" />
        <rdf:li resource="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-991.html" />
      </rdf:Seq>
    </items>
  </channel>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1005.html">
    <title>Neural representation of Bidirectional Reflectance Distribution Function</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1005.pdf</link>
    <dc:creator>Hu, Zheyuan</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2026-03</dc:date>
    <description>
        Despite the advent of neural rendering, specifically supervised
        models trained on Bidirectional Reflectance Distribution
        Function (BRDF) for appearance modelling, there is limited
        understanding of their effectiveness, efficiency, and utility
        for downstream research.
        
        In this work, I designed implicit neural representations of
        BRDFs and evaluated their real-world performance relative to
        classical models, with extensions to sparse-sample
        reconstruction and multi-modal material synthesis. I further
        investigated importance sampling strategies in the rendering
        pipeline, alongside both supervised and generative methods.
        
        Building upon these contributions, I implemented a novel
        multi-modal generative pipeline and proposed new quantitative
        metrics for material synthesis, addressing a long-standing gap
        in the evaluation of neural materials.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1004.html">
    <title>Contention-resilient overcommitment for serverless deployments</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1004.pdf</link>
    <dc:creator>Isstaif, Al Amjad Tawfiq</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2026-02</dc:date>
    <description>
        Serverless computing, or Function-as-a-Service (FaaS),
        simplifies application deployment by allowing developers to
        focus on functionality while the service provider manages the
        underlying resources. This model charges users based on actual
        resource usage, making it ideal for intermittent workloads,
        unlike Infrastructure-as-a-Service (IaaS), where users pay for
        idle virtual machines. Thus, serverless computing shifts the
        cost of idle resources to the service provider, who must manage
        the underlying server capacity. In container-based serverless
        platforms, idle functions are often kept active to avoid the
        costs associated with initialising containers, or “cold starts”,
        which exacerbates resource under-utilisation.
        
        Overcommitting resources is a strategy to enhance resource
        utilisation by leveraging statistical multiplexing, where
        multiple workloads are unlikely to reach peak usage
        simultaneously. This approach allows for greater workload
        colocation and reduces server costs by pooling resources rather
        than over-allocating resources per function. However, service
        providers often adopt conservative overcommitment policies to
        avoid performance degradation and workload disruption, as higher
        levels of colocation increase the risk of contention. In
        container-based serverless systems, the high density of function
        containers—due to their small resource footprint—limits the
        benefits of overcommitment because it increases kernel
        scheduling overhead when managing large number of colocated
        containers.
        
        This dissertation proposes a host-focused approach to addressing
        the workload colocation problem in the context of
        serverless-style Linux container workloads and CPU resources.
        CPUs can be the bottleneck for many workloads and the efficiency
        of CPU multiplexing can limit the degree of workload colocation
        that can be achieved. The central thesis of this dissertation is
        that mitigating the CPU scheduling overhead associated with
        workload colocation allows to unlock additional capacity in a
        serverless cluster. The primary contribution is an enhanced
        Linux CPU scheduler that enables larger number of containers to
        share the CPU resources of a single server while robustly
        managing the increased risk of contention. This contribution is
        evaluated within a cluster resource allocation framework,
        demonstrating that higher levels of workload colocation are
        achievable thanks to the proposed scheduler.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1003.html">
    <title>A ten-year review of the Cambridge Cybercrime Centre</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1003.pdf</link>
    <dc:creator>Pankow, Hannah</dc:creator>
    <dc:creator>Hutchings, Alice</dc:creator>
    <dc:creator>Clayton, Richard</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2025-11</dc:date>
    <description>
        This report presents a comprehensive ten-year review of the
        Cambridge Cybercrime Centre (CCC), evaluating its impact on the
        academic research landscape. Employing a mixed-methods approach,
        we combine a scoping review of 203 publications with a survey of
        44 dataset users and 10 in-depth interview sessions with 12
        individuals, including the Centre’s leadership. The findings
        reveal a consistent and accelerating uptake of the Centre’s
        datasets, with 131 of the reviewed papers making explicit use of
        the data. This growth is driven by a broad, international, and
        interdisciplinary community. We find the Centre’s primary
        contribution is the enablement of new research, particularly for
        postgraduate students and early-career researchers who gain
        access to large-scale data that would otherwise be infeasible to
        collect. While technical setup difficulties present a barrier
        for some users, the development of the PostCog analysis tool has
        been a critical success, serving as both an accessibility and
        efficiency tool for the community. However, its impact is
        currently limited by an awareness gap. We conclude that the
        Centre’s success lies in its evolution from a data provider to a
        holistic research ecosystem. Its future relevance will depend on
        its ability to adapt its data collection to evolving online
        harms and to ensure the long-term viability of its critical
        infrastructure.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1002.html">
    <title>Attestable Builds: compiling verifiable binaries on untrusted systems using trusted execution environments</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1002.pdf</link>
    <dc:creator>Hugenroth, Daniel</dc:creator>
    <dc:creator>Lins, Mario</dc:creator>
    <dc:creator>Mayrhofer, René</dc:creator>
    <dc:creator>Beresford, Alastair R.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2025-10</dc:date>
    <description>
        In this paper we present attestable builds, a new paradigm to
        provide strong source-to-binary correspondence in software
        artifacts. We tackle the challenge of opaque build pipelines
        that disconnect the trust between source code, which can be
        understood and audited, and the final binary artifact which is
        difficult to inspect. Our system uses modern trusted execution
        environments (TEEs) and sandboxed build containers to provide
        strong guarantees that a given artifact was correctly built from
        a specific source code snapshot. As such it complements existing
        approaches like reproducible builds which typically require
        time-intensive modifications to existing build configurations
        and dependencies, and require independent parties to
        continuously build and verify artifacts. In comparison, an
        attestable build requires only minimal changes to an existing
        project, and offers nearly instantaneous verification of the
        correspondence between a given binary and the source code and
        build pipeline used to construct it. We evaluate it by building
        open-source software libraries—focusing on projects which are
        important to the trust chain and have proven difficult to be
        built deterministically. The overhead (42 seconds start-up
        latency and 14% increase in build duration) is small in
        comparison to the overall build time. Importantly, our prototype
        can build complex projects such as LLVM Clang without requiring
        any modifications to their source code and build scripts.
        Finally, we formally model and verify the attestable build
        design to demonstrate its security against well-resourced
        adversaries.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1001.html">
    <title>Toward transient-execution attack mitigations on CHERI</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1001.pdf</link>
    <dc:creator>Fuchs, Franz A.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2025-08</dc:date>
    <description>
        This thesis explores how to protect Capability Hardware Enhanced
        RISC Instructions (CHERI) systems from transient-execution
        attacks. Transient-execution attacks shocked the computing world
        as they allow security mechanisms to be circumvented via
        seemingly safe performance-enhancing mechanisms. These attacks
        use misguided speculation to access secrets and transmit them
        via a side channel. Since the initial discovery of this attack
        class, every year saw fresh attacks being discovered with a lack
        of mitigation mechanisms.
        
        CHERI defines architectural capabilities that help to tackle
        spatial and temporal memory safety issues. However, the CHERI
        ISA has not been designed with transient-execution
        vulnerabilities in mind. In order to satisfy performance
        requirements, CHERI implementations employ out-of-order and
        speculative execution mechanisms. The lack of ISA-level
        guarantees leads to multiple attack scenarios on conventional
        and CHERI systems. This thesis demonstrates a full-scale attack
        on CHERI-Toooba that manages to break CHERI’s security
        guarantees in speculation and reliably leaks a secret value.
        Motivated by these findings, I developed ISA-level contracts
        that restrict speculative execution for both conventional and
        CHERI systems. As a major contribution, these contracts close a
        gap in architectures by giving guarantees about speculation
        mechanisms, which allows secure software to be built atop these
        contracts. I evaluated my contracts on CHERI-Toooba, which is an
        out-of-order, superscalar implementation of CHERI-RISC-V. My
        contracts offer substantial security guarantees and can
        surprisingly lead to improvements in both cycle performance as
        well as area usage on FPGAs.
        
        Making fine-grained compartmentalisation robust against
        transient-execution attacks is critical for the overall security
        of CHERI systems. In this work, I compare multiple different
        solutions and introduce Thread ID Capability (TIDC) registers as
        a fast and compact solution to compartmentalisation on
        CHERI-RISC-V. In order to facilitate fast and secure transitions
        between compartments, this work finds that microarchitectures
        must not allow microarchitectural state to leak. This thesis
        suggests multiple approaches to separating microarchitectural
        state and evaluates their performance on multiple
        compartmentalisation models. Last, this thesis introduces the
        notion of Compartment ID (CID) sealing. In this work, I
        implement a necessary subset of this approach that allows
        out-of-order microarchitectures to keep track of current
        software-defined CIDs. I find this approach to have significant
        performance advantages over conventional speculation fences.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1000.html">
    <title>Application identification in data centres: a traffic driven approach</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-1000.pdf</link>
    <dc:creator>Popeanga, Mihnea-Stefan</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2025-08</dc:date>
    <description>
        Modern data centre (DC) operators cannot tune and secure what
        they cannot see. However, application identification from
        network traces is held back by two obstacles: public packet
        captures are scarce because commercial workloads and user data
        are confidential; the few datasets that exist do not focus on DC
        specific workloads, and do not allow others to reproduce the
        experiments. This dissertation tackles both these issues. I
        designed and implemented an end-to-end framework that can
        systematically capture traffic with nanosecond timestamps,
        demultiplex flows, and compute a set of 203 features. Each flow
        is coupled with extensive metadata detailing the exact setup
        that generated the traffic, allowing any researcher to reproduce
        the experiments under identical conditions. Using this workflow,
        I created the first public DC-focused dataset, unencumbered with
        personal or confidential information, that spans three
        representative workloads. Machine learning classification
        techniques demonstrate the utility of the data: traditional
        feature-based models achieve perfect accuracy when identifying
        the three workloads. A core novelty is that besides strict
        identification, the collected data includes significant
        metadata. To demonstrate this, I tackled performance estimation
        as well for one of the workloads. A 1D CNN can distinguish
        between flows corresponding to different performance metrics
        with an accuracy of 95%.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-999.html">
    <title>CoverDrop White Paper</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-999.pdf</link>
    <dc:creator>Hugenroth, Daniel</dc:creator>
    <dc:creator>Cutler, Sam</dc:creator>
    <dc:creator>Kendrick, Dominic</dc:creator>
    <dc:creator>Savarese, Mario</dc:creator>
    <dc:creator>Hunter-Green, Zeke</dc:creator>
    <dc:creator>McMahon, Philip</dc:creator>
    <dc:creator>Kalanaki, Marjan</dc:creator>
    <dc:creator>Vasile, Diana A.</dc:creator>
    <dc:creator>Bejasa-Dimmock, Sabina</dc:creator>
    <dc:creator>Hoyland, Luke</dc:creator>
    <dc:creator>Beresford, Alastair R.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2025-06</dc:date>
    <description>
        The free press fulfills an important function in a democracy. It
        can provide individuals with a mechanism through which they can
        hold powerful people and organizations to account. In previous
        work, the University of Cambridge developed CoverDrop: a set of
        extensions to a typical news app which provided a secure and
        usable method of establishing initial contact between
        journalists and sources. Since publication, The Guardian and the
        University of Cambridge have undertaken further work on the
        design, highlighting additional challenges and shortcomings
        which needed to be addressed before deployment. This white paper
        presents an updated design of the CoverDrop system which
        addresses these issues, and describes the version that The
        Guardian first deployed in April 2025.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-998.html">
    <title>Modelling orchestration</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-998.pdf</link>
    <dc:creator>Jeffery, Andrew</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2025-05</dc:date>
    <description>
        Modern cloud services operate at significant and increasing
        scale. The growth of these services has led to the need for
        automated management to keep them operational across many
        thousands of nodes and multiple geo-distributed sites.
        Orchestrators are the platforms designed to automate this
        management and standardise the workflows involved.
        
        The significant uptake of modern orchestrators means that they
        have expanded their scope out of private datacenters, into the
        public cloud, and now even towards the edge of the network.
        These are environments for which they are not designed, and
        while they share some characteristics with private datacenters,
        the differences are sufficiently significant to require
        rethinking the design of the orchestrators.
        
        In this dissertation, I examine orchestrator design, focusing on
        the global state they maintain in their central datastores. To
        do this I propose a definition of the orchestration problem and
        provide a lightweight formalisation using model checking. I use
        this model to explore the properties of an existing
        orchestrator, explaining observed failures arising from changes
        in the consistency model. I then explore the impact of
        variations to the consistency model of the global state on
        properties and performance of the model checking.
        
        Using insights from this model and its consistency analysis I
        then propose two new datastores to support the control-plane of
        orchestration platforms, for the public cloud and the near-edge.
        In the public cloud data confidentiality is paramount, trying to
        minimise the actors within the trust boundary to enable secure,
        trusted deployments. For the near-edge I focus on availability
        of a single cluster, enabling individual locations to process
        requests without reliance on persistent non-local communication.
        
        Together, these components, the model and the two datastores,
        enable orchestration platforms to be optimised for their
        environments, enabling more widespread use.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-997.html">
    <title>CHERI-SIMT report: implementing capability memory protection in GPGPUs</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-997.pdf</link>
    <dc:creator>Naylor, Matthew</dc:creator>
    <dc:creator>Joannou, Alexandre</dc:creator>
    <dc:creator>Markettos, A. Theodore</dc:creator>
    <dc:creator>Metzger, Paul</dc:creator>
    <dc:creator>Moore, Simon W.</dc:creator>
    <dc:creator>Jones, Timothy M.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2025-03</dc:date>
    <description>
        Governments are increasingly advising software manufacturers to
        employ memory-safe languages and technologies to combat
        adversarial attacks on modern computing infrastructure. This
        introduces pressures across the entire computing industry,
        including GPGPU vendors who provide implementations of unsafe
        C/C++-based languages, such as CUDA and OpenCL, for programming
        the devices they produce. One of the memory-safety technologies
        being recommended is Capability Hardware Enhanced RISC
        Instructions (CHERI). CHERI builds strong and efficient memory
        safety into underlying instruction-set architectures allowing
        continued, but memory-safe, use of C/C++-based languages on top.
        Another option being recommended is Rust, a memory-safe systems
        programming language that can viably replace C/C++ in some
        cases.
        
        In this report, we evaluate the feasibility of incorporating
        CHERI into GPGPU architectures by extending a prototype,
        open-source, synthesisable, SIMT core and CUDA-like programming
        environment with support for CHERI. We present techniques to
        considerably ameliorate the costs of CHERI in SIMT designs,
        reducing register-file storage overheads from 103% to 7%,
        logic-area overheads by 44% to a cost comparable to one
        additional multiplier per vector lane, and execution-time
        overheads to 1.6%. By comparison, an experimental Rust port of
        the same GPGPU benchmark suite shows a 34% increase in execution
        time due to software bounds checking. With the proposed
        techniques, CHERI offers a viable path to strong and efficient
        GPGPU memory safety, while avoiding the need to replace
        established programming practices.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-996.html">
    <title>It is time to standardize principles and practices for software memory safety (extended version)</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-996.pdf</link>
    <dc:creator>Watson, Robert N. M.</dc:creator>
    <dc:creator>Baldwin, John</dc:creator>
    <dc:creator>Chen, Tony</dc:creator>
    <dc:creator>Chisnall, David</dc:creator>
    <dc:creator>Clarke, Jessica</dc:creator>
    <dc:creator>Davis, Brooks</dc:creator>
    <dc:creator>Filardo, Nathaniel Wesley</dc:creator>
    <dc:creator>Gutstein, Brett</dc:creator>
    <dc:creator>Jenkinson, Graeme</dc:creator>
    <dc:creator>Kern, Christoph</dc:creator>
    <dc:creator>Laurie, Ben</dc:creator>
    <dc:creator>Mazzinghi, Alfredo</dc:creator>
    <dc:creator>Moore, Simon W.</dc:creator>
    <dc:creator>Neumann, Peter G.</dc:creator>
    <dc:creator>Okhravi, Hamed</dc:creator>
    <dc:creator>Rebert, Alex</dc:creator>
    <dc:creator>Richardson, Alex</dc:creator>
    <dc:creator>Sewell, Peter</dc:creator>
    <dc:creator>Tratt, Laurence</dc:creator>
    <dc:creator>Vijayaraghavan, Murali</dc:creator>
    <dc:creator>Vincent, Hugo</dc:creator>
    <dc:creator>Witaszczyk, Konrad</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2025-02</dc:date>
    <description>
        This is the extended version of the paper, “It is time to
        standardize principles and practices for software memory
        safety”, which appeared in the February 2025 issue of
        Communications of the ACM.
        
        In this report, we explore memory-safety standardization, which
        we argue is an essential step to promoting universal strong
        memory safety in government and industry, and, in turn, to
        ensure access to more secure software for all. Over the last two
        decades, a set of four research technologies for strong memory
        safety – memory-safe systems languages, hardware and software
        memory protection, formal approaches, and software
        compartmentalization – have reached sufficient maturity to see
        early deployment in security-critical use cases. However, there
        remains no shared, technology-neutral terminology or framework
        with which to specify memory-safety requirements.
        
        This is needed to enable reliable specification, design,
        implementation, auditing, and procurement of strongly
        memory-safe systems. Failure to speak in a common language makes
        it difficult to understand the possibilities or communicate
        accurately with one another, limiting perceived benefits and
        hence actual demand. The lack of such a framework also acts as
        an impediment to potential future policy interventions, and, in
        turn, as an impediment to stating requirements to address
        observed market failures preventing adoption of these
        technologies. Standardization would also play a critical role in
        improving industrial best practice, another key aspect of
        adoption.
        
        We begin with an overview of the many techniques – from hardware
        to software to formal theories – that have been developed and
        redefined over several decades, and how each plays a part in
        moving us towards strong memory safety. We explore how these
        technologies can be differentiated, considering both differences
        in functional protection and strength. We discuss how adoption
        barriers and potential market failures have limited adoption,
        and how the standardization gap limits potential interventions.
        We propose potential approaches to standardization – likely a
        task not limited to any one institution or standards body – and
        conclude with an illustrative universal memory-safety adoption
        timeline proposing a realistic path to universal adoption given
        suitable incentivization.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-995.html">
    <title>Fragment-template power-analysis attacks against microcontroller implementations of the 32-bit stream cipher ChaCha</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-995.pdf</link>
    <dc:creator>Batchelor, Henry</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2024-07</dc:date>
    <description>
        ChaCha is a widely adopted stream cipher, used for both random
        number generation and encryption. I propose a factor graph of
        ChaCha to improve the success rate of side-channel attacks that
        provide leakages throughout the entire execution of the
        algorithm. I also assess (fragment) template attacks against
        several implementations of ChaCha to demonstrate that the factor
        graph is helpful when working with actual side-channel attacks.
        
        These attacks could fully recover the correct key from an 8-bit
        implementation. In contrast, a 32-bit implementation, with most
        of the state held in registers, was significantly more
        challenging to attack. An adversary with access to 10 power
        traces and an incremented counter could achieve a success rate
        of 14.6%. For a 32-bit implementation, with lots of SRAM
        activity, an attacker could successfully recover the key in 2.6%
        of cases from a single trace.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-994.html">
    <title>Deception and defense from machine learning to supply chains</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-994.pdf</link>
    <dc:creator>Boucher, Nicholas</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2024-05</dc:date>
    <description>
        Broad classes of modern cyberattacks are dependent upon their
        ability to deceive human victims. Given the ubiquity of text
        across modern computational systems, we present and analyze a
        set of techniques that attack the encoding of text to produce
        deceptive inputs to critical systems. By targeting a core
        building block of modern systems, we can adversarially
        manipulate dependent applications ranging from natural language
        processing pipelines to search engines to code compilers. Left
        undefended, these vulnerabilities enable many ill effects
        including uncurtailed online hate speech, disinformation
        campaigns, and software supply chain attacks.
        
        We begin by generating adversarial examples for text-based
        machine learning systems. Due to the discrete nature of text,
        adversarial examples for text pipelines have traditionally
        involved conspicuous perturbations compared to the subtle
        changes of the more continuous visual and auditory domains.
        Instead, we propose imperceptible perturbations: techniques that
        manipulate text encodings without affecting the text in its
        rendered form. We use these techniques to craft the first set of
        adversarial examples for text-based machine learning systems
        that are human-indistinguishable from their unperturbed form,
        and demonstrate their efficacy against systems ranging from
        machine translation to toxic content detection. We also describe
        a set of defenses against these techniques.
        
        Next, we propose a new attack setting which we call adversarial
        search. In this setting, an adversary seeks to manipulate the
        results of search engines to surface certain results only and
        consistently when a hidden trigger is detected. We accomplish
        this by applying the encoding techniques of imperceptible
        perturbations to both indexed content and queries in major
        search engines. We demonstrate that imperceptibly encoded
        triggers can be used to manipulate the results of current
        commercial search engines, and then describe a social
        engineering attack exploiting this vulnerability that can be
        used to power disinformation campaigns. Again, we describe a set
        of defenses against these techniques.
        
        We then look to compilers and propose a different set of text
        perturbations which can be used to craft deceptive source code.
        We exploit the bidirectional nature of modern text standards to
        embed directionality control characters into comments and string
        literals. These control characters allow attackers to shuffle
        the sequence of tokens rendered in source code, and in doing so
        to implement programs that appear to do one thing when rendered
        to human code reviewers, but to do something different from the
        perspective of the compiler. We dub this technique the Trojan
        Source attack, and demonstrate the vulnerability of C, C++, C#,
        JavaScript, Java, Rust, Go, Python, SQL, Bash, Assembly, and
        Solidity. We also explore the applicability of this attack
        technique to launching supply chain attacks, and propose
        defenses that can be used to mitigate this risk. We also
        describe and analyze a 99-day coordinated disclosure that
        yielded patches to dozens of market-leading compilers, code
        editors, and code repositories.
        
        Finally, we propose a novel method of identifying software
        supply chain attacks that works not only for Trojan Source
        attacks, but for most forms of supply chain attacks. We describe
        an extension to compilers dubbed the Automated Bill of
        Materials, or ABOM, which embeds dependency metadata into
        compiled binaries. Specifically, hashes of each source code file
        consumed by a compiler are embedded into its emitted binary, and
        these hashes are included recursively into all downstream
        dependencies. They are stored in a highly space and time
        efficient probabilistic data structure that requires an expected
        value of just 2.1 bytes to represent each unique dependency
        source code file. With ABOMs, it becomes possible to detect all
        naturally occurring and most adversarially induced
        vulnerabilities used for supply chain attacks in downstream
        software by querying binaries for the presence of poisoned
        dependencies without the need to locate tangible indicators of
        compromise.
        
        In this thesis, we therefore demonstrate how weaknesses in a
        core building block of modern systems – text encodings – can
        cause failures in a wide range of domains including machine
        learning, search engines, and source code. We propose defenses
        against each variant of our attack, including a new tool to
        identify most generic software supply chain attacks. We believe
        that these techniques will be useful in securing software
        ecosystems against the next generation of attacks.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-993.html">
    <title>Distributional and relational inductive biases for graph representation learning in biomedicine</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-993.pdf</link>
    <dc:creator>Scherer, Paul M.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2024-04</dc:date>
    <description>
        The immense complexity in which DNAs, RNAs, proteins and other
        biomolecules interact amongst themselves, with one another, and
        the environment to bring about life processes motivates the mass
        collection of biomolecular data and data-driven modelling to
        gain insights into physiological phenomena. Recent predictive
        modelling efforts have focused on deep representation learning
        methods which offer a flexible modelling paradigm to handling
        high dimensional data at scale and incorporating inductive
        biases. The emerging field of representation learning on graph
        structured data opens opportunities to leverage the abundance of
        structured biomedical knowledge and data to improve model
        performance.
        
        Grand international initiatives have been coordinated to
        organise and structure our growing knowledge about the
        interactions and putative functions of biomolecular entities
        using graphs and networks. This dissertation considers how we
        may use the inductive biases within recent graph representation
        learning methods to leverage these structures and incorporate
        biologically relevant relational priors into machine learning
        methods for biomedicine. We present contributions in two parts
        with the aim to foster research in this multidisciplinary domain
        and present novel methods that achieve strong performance
        through the use of distributional and relational inductive
        biases operating on graph-structured biomedical knowledge and
        data.
        
        The first part is concerned with consolidating and expanding the
        current ecosystem of practical frameworks dedicated to graph
        representation learning. Our first contribution presents Geo2DR,
        the first practical framework and software library for
        constructing methods capable of learning distributed
        representations of graphs. Our second contribution, Pytorch
        Geometric Temporal, is the first open source representation
        learning library for dynamic graphs, expanding the scope of
        research software on graph neural networks that were previously
        limited to static graphs.
        
        The second part presents three methods wherein each contribution
        tackles an active biomedical research problem using relational
        structures that exist within different aspects of the data.
        First, we present a methodology for learning distributed
        representations of molecular graphs in the context of drug pair
        scoring. Next, we present a method for leveraging structured
        knowledge on the variables of gene expression profiles to
        automatically construct sparse neural models for cancer
        subtyping. Finally, we present a state-of-the-art cell
        deconvolution model for spatial transcriptomics data using the
        positional relationships between observations in the dataset.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-992.html">
    <title>Transient execution vulnerabilities in the security context of server hardware</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-992.pdf</link>
    <dc:creator>Randal, Allison</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2023-12</dc:date>
    <description>
        Many mitigations have been proposed and implemented for many
        variants of the transient execution vulnerabilities, and while
        the Meltdown-type exception-based transient execution
        vulnerabilities have proven to be tractable, Spectre-type
        vulnerabilities and other speculation-based transient execution
        vulnerabilities have been far more resistant to countermeasures.
        For smaller-scale embedded systems or security-focused hardware
        such as a cryptographic system or a root-of-trust (RoT),
        eliminating speculation is widely accepted as a reasonable
        approach to improving security. But, for larger-scale and
        general-purpose hardware, eliminating speculation is often
        dismissed as inconceivable, though the claim that speculation is
        required for adequate performance is rarely supported by
        concrete performance results. The performance results we do have
        from several independent strands of research over the past few
        decades have shown that speculation features on large-scale
        server hardware do not offer the same performance advantages as
        on smaller-scale hardware, so eliminating speculation on
        large-scale server hardware does not harm performance as much as
        we might suspect. And selective speculation techniques have
        shown that speculation-based transient execution vulnerabilities
        can be mitigated by a partial elimination of speculation, so we
        can preserve some of the performance of speculation while
        subduing the security risk. In order to demonstrate that
        eliminating speculation is a feasible approach to mitigating the
        transient execution vulnerabilities on large-scale server
        hardware, this work considers three alternative approaches that
        partially or completely eliminate speculative execution:
        heterogeneous multicore systems combining speculative and
        non-speculative cores; entirely non-speculative
        microarchitectures; and selective speculation
        microarchitectures.
    </description>
  </item>
  <item rdf:about="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-991.html">
    <title>Porting a mix network client to mobile</title>
    <link>https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-991.pdf</link>
    <dc:creator>Kung, Jacky W. E.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2023-12</dc:date>
    <description>
        This project set out to investigate the feasibility of mix
        network clients on the mobile ecosystem. It considers the
        Android operating system, and Nym, a production-grade mix
        network infrastructure based on the abstract Loopix architecture
        first presented in 2017. The goal of the project was to produce
        a minimal working prototype, and present an evaluation of the
        trade-offs necessary for an efficient implementation in the
        Android ecosystem. Nym’s client codebase written in Rust has
        been successfully ported over to Android after adjusting parts
        of the code and constructing the compilation toolchain. An
        exploration of the performance effects of compilation parameters
        and mixnet parameters is presented. Two extension tasks were
        completed: a semi-automated compilation pipeline, and further
        evaluation using measurements taken using the custom hardware
        provided by my supervisor. The repository also contains, as a
        side-product, a Rust library that provides a friendly interface
        between code that runs across the Rust and Kotlin languages.
    </description>
  </item>
</rdf:RDF>
