<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://purl.org/rss/1.0/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
>
  <channel rdf:about="http://www.cl.cam.ac.uk/techreports/">
    <title>Computer Laboratory Technical Reports</title>
    <link>http://www.cl.cam.ac.uk/techreports/</link>
    <description>Recent research reports published by the Computer Laboratory at the University of Cambridge.</description>
    <items>
      <rdf:Seq>
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-833.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-832.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-831.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-830.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-829.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-828.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-827.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-826.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-824.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-823.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-822.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-821.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-820.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-819.html" />
        <rdf:li resource="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-818.html" />
      </rdf:Seq>
    </items>
  </channel>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-833.html">
    <title>Practice-led design and evaluation of a live visual constraint language</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-833.pdf</link>
    <dc:creator>Blackwell, Alan F.</dc:creator>
    <dc:creator>Charalampidis, Ignatios</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2013-05</dc:date>
    <description>
        We report an experimental evaluation of Palimpsest, a novel
        purely-visual programming language. A working prototype of
        Palimpsest had been developed following a practice-led process,
        in order to assess whether tools for use in the visual arts can
        usefully be created by adopting development processes that
        emulate arts practice. This initial prototype was received more
        positively by users who have high self-efficacy in both visual
        arts and computer use. A number of potential usability
        improvements are identified, structured according to the
        Cognitive Dimensions of Notations framework.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-832.html">
    <title>Communication centric, multi-core, fine-grained processor architecture</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-832.pdf</link>
    <dc:creator>Chadwick, Gregory A.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2013-04</dc:date>
    <description>
        With multi-core architectures now firmly entrenched in many
        application areas both computer architects and programmers now
        face new challenges. Computer architects must increase core
        count to increase explicit parallelism available to the
        programmer in order to provide better performance whilst leaving
        the programming model presented tractable. The programmer must
        find ways to exploit this explicit parallelism provided that
        scales well with increasing core and thread availability.
        
        A fine-grained computation model allows the programmer to expose
        a large amount of explicit parallelism and the greater the level
        of parallelism exposed the better increasing core counts can be
        utilised. However a fine-grained approach implies many
        interworking threads and the overhead of synchronising and
        scheduling these threads can eradicate any scalability
        advantages a fine-grained program may have.
        
        Communication is also a key issue in multi-core architecture.
        Wires do not scale as well as gates, making communication
        relatively more expensive compared to computation so optimising
        communication between cores on chip becomes important.
        
        This dissertation presents an architecture designed to enable
        scalable fine-grained computation that is communication aware
        (allowing a programmer to optimise for communication). By
        combining a tagged memory, where each word is augmented with a
        presence bit signifying whether or not data is present in that
        word, with a hardware based scheduler, which allows a thread to
        wait upon a word becoming present with low overhead. A flexible
        and scalable architecture well suited to fine-grained
        computation can be created, one which enables this without
        needing the introduction of many new architectural features or
        instructions. Communication is made explicit by enforcing that
        accesses to a given area of memory will always go to the same
        cache, removing the need for a cache coherency protocol.
        
        The dissertation begins by reviewing the need for multi-core
        architecture and discusses the major issues faced in their
        construction. It moves on to look at fine-grained computation in
        particular. The proposed architecture, known as Mamba, is then
        presented in detail with several software techniques suitable
        for use with it introduced. An FPGA implementation of Mamba is
        then evaluated against a similar architecture that lacks the
        extensions Mamba has for assisting in fine-grained computation
        (namely a memory tagged with presence bits and a hardware
        scheduler). Microbenchmarks examining the performance of FIFO
        based communication, MCS locks (an efficient spin-lock
        implementation based around queues) and barriers demonstrate
        Mamba’s scalability and insensitivity to thread count. A SAT
        solver implementation demonstrates that these benefits have a
        real impact on an actual application.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-831.html">
    <title>Communication for programmability and performance on multi-core processors</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-831.pdf</link>
    <dc:creator>Luff, Meredydd</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2013-04</dc:date>
    <description>
        The transition to multi-core processors has yielded a
        fundamentally new sort of computer. Software can no longer
        benefit passively from improvements in processor technology, but
        must perform its computations in parallel if it is to take
        advantage of the continued increase in processing power.
        Software development has yet to catch up, and with good reason:
        parallel programming is hard, error-prone and often unrewarding.
        
        In this dissertation, I consider the programmability challenges
        of the multi-core era, and examine three angles of attack.
        
        I begin by reviewing alternative programming paradigms which aim
        to address these changes, and investigate two popular
        alternatives with a controlled pilot experiment. The results are
        inconclusive, and subsequent studies in that field have suffered
        from similar weakness. This leads me to conclude that empirical
        user studies are poor tools for designing parallel programming
        systems.
        
        I then consider one such alternative paradigm, transactional
        memory, which has promising usability characteristics but
        suffers performance overheads so severe that they mask its
        benefits. By modelling an ideal inter-core communication
        mechanism, I propose using our embarrassment of parallel riches
        to mitigate these overheads. By pairing “helper” processors with
        application threads, I offload the overheads of software
        transactional memory, thereby greatly mitigating the problem of
        serial overhead.
        
        Finally, I address the mechanics of inter-core communication.
        Due to the use of cache coherence to preserve the programming
        model of previous processors, explicitly communicating between
        the cores of any modern multi-core processor is painfully slow.
        The schemes proposed so far to alleviate this problem are
        complex, insufficiently general, and often introduce new
        resources which cannot be virtualised transparently by a
        time-sharing operating system. I propose and describe an
        asynchronous remote store instruction, which is issued by one
        core and completed asynchronously by another into its own local
        cache. I evaluate several patterns of parallel communication,
        and determine that the use of remote stores greatly increases
        the performance of common synchronisation kernels. I quantify
        the benefit to the feasibility of fine-grained parallelism. To
        finish, I use this mechanism to implement my parallel STM
        scheme, and demonstrate that it performs well, reducing
        overheads significantly.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-830.html">
    <title>Massively parallel neural computation</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-830.pdf</link>
    <dc:creator>Fox, Paul J.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2013-03</dc:date>
    <description>
        Reverse-engineering the brain is one of the US National Academy
        of Engineering’s ‘Grand Challenges’. The structure of the brain
        can be examined at many different levels, spanning many
        disciplines from low-level biology through psychology and
        computer science. This thesis focusses on real-time computation
        of large neural networks using the Izhikevich spiking neuron
        model.
        
        Neural computation has been described as ‘embarrassingly
        parallel’ as each neuron can be thought of as an independent
        system, with behaviour described by a mathematical model.
        However, the real challenge lies in modelling neural
        communication. While the connectivity of neurons has some
        parallels with that of electrical systems, its high fan-out
        results in massive data processing and communication
        requirements when modelling neural communication, particularly
        for real-time computations.
        
        It is shown that memory bandwidth is the most significant
        constraint to the scale of real-time neural computation,
        followed by communication bandwidth, which leads to a decision
        to implement a neural computation system on a platform based on
        a network of Field Programmable Gate Arrays (FPGAs), using
        commercial off-the-shelf components with some custom supporting
        infrastructure. This brings implementation challenges,
        particularly lack of on-chip memory, but also many advantages,
        particularly high-speed transceivers. An algorithm to model
        neural communication that makes efficient use of memory and
        communication resources is developed and then used to implement
        a neural computation system on the multi-FPGA platform.
        
        Finding suitable benchmark neural networks for a massively
        parallel neural computation system proves to be a challenge. A
        synthetic benchmark that has biologically-plausible fan-out,
        spike frequency and spike volume is proposed and used to
        evaluate the system. It is shown to be capable of computing the
        activity of a network of 256k Izhikevich spiking neurons with a
        fan-out of 1k in real-time using a network of 4 FPGA boards.
        This compares favourably with previous work, with the added
        advantage of scalability to larger neural networks using more
        FPGAs.
        
        It is concluded that communication must be considered as a
        first-class design constraint when implementing massively
        parallel neural computation systems.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-829.html">
    <title>Microelectronic security measures</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-829.pdf</link>
    <dc:creator>Paul, Philip Christopher</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2013-02</dc:date>
    <description>
        In this dissertation I propose the concept of tamper protection
        grids for microelectronic security devices made from organic
        electronic materials. As security devices have become ubiquitous
        in recent years, they are becoming targets for criminal
        activity. One general attack route to breach the security is to
        carry out physical attack after depackaging a device. Commercial
        security devices use a metal wire mesh within the chip to
        protect against these attacks. However, as a microchip is
        physically robust, the mesh is not affected by depackaging.
        
        As a better way of protecting security devices against attacks
        requiring the chip package to be removed, I investigate a
        protection grid that is vulnerable to damage if the packaging is
        tampered with. The protection grid is connected directly to
        standard bond pads on the microchip, to allow direct electronic
        measurements, saving the need for complex sensor structures.
        That way, a security device can monitor the package for
        integrity, and initiate countermeasures if required.
        
        The feasibility of organic tamper protection grids was
        evaluated. To establish the viability of the concept, a
        fabrication method for these devices was developed, the
        sensitivity to depackaging was assessed, and practical
        implementation issues were evolved. Inkjet printing was chosen
        as fabrication route, as devices can be produced at low cost
        while preserving flexibility of layout. A solution to the
        problem of adverse surface interaction was found to ensure good
        print quality on the hydrophobic chip surface. Standard contacts
        between chip and grid are non-linear and degrade between
        measurements, however it was shown that stable ohmic contacts
        are possible using a silver buffer layer. The sensitivity of the
        grid to reported depackaging methods was tested, and
        improvements to the structure were found to maximise damage to
        the grid upon tampering with the package. Practical issues such
        as measurement stability with temperature and age were
        evaluated, as well as a first prototype to assess the achievable
        measurement accuracy. The evaluation of these practical issues
        shows directions for future work that can develop organic
        protection grids beyond the proof of concept.
        
        Apart from the previously mentioned invasive attacks, there is a
        second category of attacks, non-invasive attacks, that do not
        require the removal of the chip packaging. The most prominent
        non-invasive attack is power analysis in which the power
        consumption of a device is used as oracle to reveal the secret
        key of a security device. Logic gates were designed and
        fabricated with data-independent power consumption in each clock
        cycle. However, it is shown that this is not sufficient to
        protect the secret key. Despite balancing the discharged
        capacitances in each clock cycle, the power consumed still
        depends on the data input. While the overall charge consumed in
        each clock cycle matches to a few percent, differences within a
        clock cycle can easily be measured. It was shown that the
        dominant cause for this imbalance is early propagation, which
        can be mitigated by ensuring that evaluation in a gate only
        takes place after all inputs are present. The second major
        source of imbalance are mismatched discharge paths in logic
        gates, which result in data-dependent evaluation times of a
        gate. This source of imbalance is not as trivial to remove, as
        it conflicts with balancing the discharged capacitances in each
        clock cycle.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-828.html">
    <title>Exploring networks-on-chip for FPGAs</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-828.pdf</link>
    <dc:creator>Francis, Rosemary M.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2013-01</dc:date>
    <description>
        Developments in fabrication processes have shifted the cost
        ratio between wires and transistors to allow new trade-offs
        between computation and communication. Rising clock speeds have
        lead to multi-cycle cross-chip communication and pipelined
        buses. It is then a small step from pipelining to switching and
        the development of multi-core networked systems-on-chip. Modern
        FPGAs are also now home to complex systems-on-chip. A change in
        the way we structure the computation demands a change in the way
        we structure the communication on-chip.
        
        This thesis looks at Network-on-Chip design for FPGAs beyond the
        trade-offs between hard (silicon) and soft (configurable)
        designs. FPGAs are capable of extremely flexible, statically
        routed bit-based wiring, but this flexibility comes at a high
        area, latency and power cost. Soft NoCs are able to maintain
        this flexibility, but do not necessarily make good use of the
        computation-communication trade-off. Hard NoCs are more
        efficient when used, but are forced to operate below capacity by
        the soft IP cores. It is also difficult to design hard NoCs with
        the flexibility needed without wasting silicon when the network
        is not used.
        
        In the first part of this thesis I explore the capability of
        Time-Division Multiplexed (TDM) wiring to bridge the gap between
        the fine-grain static FPGA wiring and the bus-based dynamic
        routing of a NoC. By replacing some of the static FPGA wiring
        with TDM wiring I am able to time division multiplex hard
        routers and make better use of the non-configurable area. The
        cost of a hard network is reduced by moving some of the area
        cost from the routers into reusable TDM wiring components. The
        TDM wiring improves the interface between the hard routers and
        soft IP blocks which leads to higher logic density overall. I
        show that TDM wiring makes hard routers a flexible and efficient
        alternative to soft interconnect.
        
        The second part of this thesis looks at the feasibility of
        replacing all static wiring on the FPGA with TDM wiring. The aim
        was to increase the routing capacity of the FPGA whilst
        decreasing the area used to implement it. An ECAD flow was
        developed to explore the extent to which the amount of wiring
        can be reduced. The results were then used to design the TDM
        circuitry.
        
        My results show that an 80% reduction in the amount of wiring is
        possible though time-division multiplexing. This reduction is
        sufficient to increase the routing capacity of the FPGA whilst
        maintaining similar or better logic density. This TDM wiring can
        be used to implement area and power-efficient hard
        networks-on-chip with good flexibility, as well as improving the
        performance of other hard IP blocks.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-827.html">
    <title>The smart card detective: a hand-held EMV interceptor</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-827.pdf</link>
    <dc:creator>Choudary, Omar S.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2012-12</dc:date>
    <description>
        Several vulnerabilities have been found in the EMV system (also
        known as Chip and PIN). Saar Drimer and Steven Murdoch have
        successfully implemented a relay attack against EMV using a fake
        terminal. Recently the same authors have found a method to
        successfully complete PIN transactions without actually entering
        the correct PIN. The press has published this vulnerability but
        they reported such a scenario as being hard to execute in
        practice because it requires specialized and complex hardware.
        
        As proposed by Ross Anderson and Mike Bond in 2006, I decided to
        create a miniature man-in-the-middle device to defend smartcard
        users against relay attacks.
        
        As a result of my MPhil project work I created a hand-held
        device, called Smart Card Defender (SCD), which intercepts the
        communication between smartcard and terminal. The device has
        been built using a low cost ATMEL AT90USB1287 microcontroller
        and other readily available electronic components. The total
        cost of the SCD has been around £100, but an industrial version
        could be produced for less than £20.
        
        I implemented several applications using the SCD, including the
        defense against the relay attack as well as the recently
        discovered vulnerability to complete a transaction without using
        the correct PIN.
        
        All the applications have been successfully tested on CAP
        readers and live terminals. Furthermore, I have performed real
        tests using the SCD at several shops in town.
        
        From the experiments using the SCD, I have noticed some
        particularities of the CAP protocol compared to the EMV
        standard. I have also discovered that the smartcard does not
        follow the physical transport protocol exactly. Such findings
        are presented in detail, along with a discussion of the results.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-826.html">
    <title>GREEN IPTV: a resource and energy efficient network for IPTV</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-826.pdf</link>
    <dc:creator>Ramos, Fernando M. V.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2012-12</dc:date>
    <description>
        The distribution of television is currently dominated by three
        technologies: over-the-air broadcast, cable, and satellite. The
        advent of IP networks and the increased availability of
        broadband access created a new vehicle for the distribution of
        TV services. The distribution of digital TV services over IP
        networks, or IPTV, offers carriers flexibility and added value
        in the form of additional services. It causes therefore no
        surprise the rapid roll-out of IPTV services by operators
        worldwide in the past few years.
        
        IPTV distribution imposes stringent requirements on both
        performance and reliability. It is therefore challenging for an
        IPTV operator to guarantee the quality of experience expected by
        its users, and doing so in an efficient manner. In this
        dissertation I investigate some of the challenges faced by IPTV
        distribution network operators, and I propose novel techniques
        to address these challenges.
        
        First, I address one of the major concerns of IPTV network
        deployment: channel change delay. This is the latency
        experienced by users when switching between TV channels.
        Synchronisation and buffering of video streams can cause channel
        change delays of several seconds. I perform an empirical
        analysis of a particular solution to the channel change delay
        problem, namely, predictive pre-joining of TV channels. In this
        scheme each Set Top Box simultaneously joins additional
        multicast groups (TV channels) along with the one requested by
        the user. If the user switches to any of these channels next,
        switching latency is virtually eliminated, and user experience
        is improved. The results show that it is possible to eliminate
        zapping delay for a significant percentage of channel switching
        requests with little impact in access network bandwidth cost.
        
        Second, I propose a technique to increase the resource and
        energy efficiency of IPTV networks. This technique is based on a
        simple paradigm: avoiding waste. To reduce the inefficiencies of
        current static multicast distribution schemes, I propose a
        semi-dynamic scheme where only a selection of TV multicast
        groups is distributed in the network, instead of all. I perform
        an empirical evaluation of this method and conclude that its use
        results in significant bandwidth reductions without compromising
        service performance. I also demonstrate that these reductions
        may translate into significant energy savings in the future.
        
        Third, to increase energy efficiency further I propose a novel
        energy and resource friendly protocol for core optical IPTV
        networks. The idea is for popular IPTV traffic to optically
        bypass the network nodes, avoiding electronic processing. I
        evaluate this proposal empirically and conclude that the
        introduction of optical switching techniques results in a
        significant increase in the energy efficiency of IPTV networks.
        
        All the schemes I present in this dissertation are evaluated by
        means of trace-driven analyses using a dataset from an
        operational IPTV service provider. Such thorough and realistic
        evaluation enables the assessment of the proposed techniques
        with an increased level of confidence, and is therefore a
        strength of this dissertation.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-824.html">
    <title>Hardware synthesis from a stream-processing functional language</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-824.pdf</link>
    <dc:creator>Frankau, Simon</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2012-11</dc:date>
    <description>
        As hardware designs grow exponentially larger, there is an
        increasing challenge to use transistor budgets effectively.
        Without higher-level synthesis tools, so much effort may be
        spent on low-level details that it becomes impractical to
        effectively design circuits of the size that can be fabricated.
        This possibility of a design gap has been documented for some
        time now.
        
        One solution is the use of domain-specific languages. This
        thesis covers the use of software-like languages to describe
        algorithms that are to be implemented in hardware. Hardware
        engineers can use the tools to improve their productivity and
        effectiveness in this particular domain. Software engineers can
        also use this approach to benefit from the parallelism available
        in modern hardware (such as reconfigurable systems and FPGAs),
        while retaining the convenience of a software description.
        
        In this thesis a statically-allocated pure functional language,
        SASL, is introduced. Static allocation makes the language suited
        to implementation in fixed hardware resources. The I/O model is
        based on streams (linear lazy lists), and implicit parallelism
        is used in order to maintain a software-like approach. The
        thesis contributes constraints which allow the language to be
        statically-allocated, and synthesis techniques for SASL
        targeting both basic CSP and a graph-based target that may be
        compiled to a register-transfer level (RTL) description.
        
        Further chapters examine the optimisation of the language,
        including the use of lenient evaluation to increase parallelism,
        the introduction of closures and general lazy evaluation, and
        the use of non-determinism in the language. The extensions are
        examined in terms of the restrictions required to ensure static
        allocation, and the techniques required to synthesise them.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-823.html">
    <title>Mitigating I/O latency in SSD-based graph traversal</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-823.pdf</link>
    <dc:creator>Roy, Amitabha</dc:creator>
    <dc:creator>Nilakant, Karthik</dc:creator>
    <dc:creator>Dalibard, Valentin</dc:creator>
    <dc:creator>Yoneki, Eiko</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2012-11</dc:date>
    <description>
        Mining large graphs has now become an important aspect of many
        applications. Recent interest in low cost graph traversal on
        single machines has lead to the construction of systems that use
        solid state drives (SSDs) to store the graph. An SSD can be
        accessed with far lower latency than magnetic media, while
        remaining cheaper than main memory. Unfortunately SSDs are
        slower than main memory and algorithms running on such systems
        are hampered by large IO latencies when accessing the SSD. In
        this paper we present two novel techniques to reduce the impact
        of SSD IO latency on semi-external memory graph traversal. We
        introduce a variant of the Compressed Sparse Row (CSR) format
        that we call Compressed Enumerated Encoded Sparse Offset Row
        (CEESOR). CEESOR is particularly efficient for graphs with
        hierarchical structure and can reduce the space required to
        represent connectivity information by amounts varying from 5% to
        as much as 76%. CEESOR allows a larger number of edges to be
        moved for each unit of IO transfer from the SSD to main memory
        and more effective use of operating system caches. Our second
        contribution is a runtime prefetching technique that exploits
        the ability of solid state drives to service multiple random
        access requests in parallel. We present a novel Run Along SSD
        Prefetcher (RASP). RASP is capable of hiding the effect of IO
        latency in single threaded graph traversal in breadth-first and
        shorted path order to the extent that it improves iteration time
        for large graphs by amounts varying from 2.6X-6X.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-822.html">
    <title>Planning with preferences using maximum satisfiability</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-822.pdf</link>
    <dc:creator>Russell, Richard A.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2012-10</dc:date>
    <description>
        The objective of automated planning is to synthesise a plan that
        achieves a set of goals specified by the user. When achieving
        every goal is not feasible, the planning system must decide
        which ones to plan for and find the lowest cost plan. The system
        should take as input a description of the user’s preferences and
        the costs incurred through executing actions. Goal utility
        dependencies arise when the utility of achieving a goal depends
        on the other goals that are achieved with it. This complicates
        the planning procedure because achieving a new goal can alter
        the utilities of all the other goals currently achieved.
        
        In this dissertation we present methods for solving planning
        problems with goal utility dependencies by compiling them to a
        variant of satisfiability known as weighted partial maximum
        satisfiability (WPMax-SAT). An optimal solution to the encoding
        is found using a general-purpose solver. The encoding is
        constructed such that its optimal solution can be used to
        construct a plan that is most preferred amongst other plans of
        length that fit within a prespecified horizon. We evaluate this
        approach against an integer programming based system using
        benchmark problems taken from past international planning
        competitions.
        
        We study how a WPMax-SAT solver might benefit from incorporating
        a procedure known as survey propagation. This is a message
        passing algorithm that estimates the probability that a variable
        is constrained to be a particular value in a randomly selected
        satisfying assignment. These estimates are used to influence
        variable/value decisions during search for a solution. Survey
        propagation is usually presented with respect to the
        satisfiability problem, and its generalisation, SP(y), with
        respect to the maximum satisfiability problem. We extend the
        argument that underpins these two algorithms to derive a new set
        of message passing equations for application to WPMax-SAT
        problems. We evaluate the success of this method by applying it
        to our encodings of planning problems with goal utility
        dependencies.
        
        Our results indicate that planning with preferences using
        WPMax-SAT is competitive and sometimes more successful than an
        integer programming approach – solving two to three times more
        subproblems in some domains, while being outperformed by a
        smaller margin in others. In some domains, we also find that
        using information provided by survey propagation in a WPMax-SAT
        solver to select variable/value pairs for the earliest decisions
        can, on average, direct search to lower cost solutions than a
        uniform sampling strategy combined with a popular heuristic.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-821.html">
    <title>Modelling energy efficiency for computation</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-821.pdf</link>
    <dc:creator>Reams, Charles</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2012-10</dc:date>
    <description>
        In the last decade, efficient use of energy has become a topic
        of global significance, touching almost every area of modern
        life, including computing. From mobile to desktop to server,
        energy efficiency concerns are now ubiquitous. However,
        approaches to the energy problem are often piecemeal and focus
        on only one area for improvement.
        
        I argue that the strands of the energy problem are inextricably
        entangled and cannot be solved in isolation. I offer a
        high-level view of the problem and, building from it, explore a
        selection of subproblems within the field. I approach these with
        various levels of formality, and demonstrate techniques to make
        improvements on all levels. The original contributions are as
        follows:
        
        Chapter 3 frames the energy problem as one of optimisation with
        constraints, and explores the impact of this perspective for
        current commodity products. This includes considerations of the
        hardware, software and operating system. I summarise the current
        situation in these respects and propose directions in which they
        could be improved to better support energy management.
        
        Chapter 4 presents mathematical techniques to compute
        energy-optimal schedules for long-running computations. This
        work reflects the server-domain concern with energy cost,
        producing schedules that exploit fluctuations in power cost over
        time to minimise expenditure rather than raw energy. This
        assumes certain idealised models of power, performance, cost,
        and workload, and draws precise formal conclusions from them.
        
        Chapter 5 considers techniques to implement energy-efficient
        real-time streaming. Two classes of problem are considered:
        first, hard real-time streaming with fixed, predictable frame
        characteristics; second, soft real-time streaming with a
        quality-of-service guarantee and probabilistic descriptions of
        per-frame workload. Efficient algorithms are developed for
        scheduling frame execution in an energy-efficient way while
        still guaranteeing hard real-time deadlines. These schedules
        determine appropriate values for power-relevant parameters, such
        as dynamic voltage–frequency scaling.
        
        A key challenge for future work will be unifying these diverse
        approaches into one “Theory of Energy” for computing. The
        progress towards this is summarised in Chapter 6. The thesis
        concludes by sketching future work towards this Theory of
        Energy.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-820.html">
    <title>A unified graph query layer for multiple databases</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-820.pdf</link>
    <dc:creator>Yoneki, Eiko</dc:creator>
    <dc:creator>Roy, Amitabha</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2012-08</dc:date>
    <description>
        There is increasing demand to store and query data with an
        inherent graph structure. Examples of such data include those
        from online social networks, the semantic web and from
        navigational queries on spatial data such as maps.
        Unfortunately, traditional relational databases have fallen
        short where such graph structured data is concerned. This has
        led to the development of specialised graph databases such as
        Neo4j. However, traditional databases continue to have a wide
        usage base and have desirable properties such as the capacity to
        support a high volume of transactions while offering ACID
        semantics. In this paper we argue that it is in fact possible to
        unify different database paradigms together in the case of graph
        structured data through the use of a common query language and
        data loader that we have named Crackle (a wordplay on
        Gra[ph]QL). Crackle provides an expressive and powerful query
        library in Clojure (a functional LISP dialect for JVMs). It also
        provides a data loader that is capable of interfacing
        transparently with various data sources such as PostgreSQL
        databases and the Redis key-value store. Crackle shields
        programmers from the backend database by allowing them to write
        queries in Clojure. Additionally, its graph-focused prefetchers
        are capable of closing the hitherto large gap between a
        PostgreSQL database and a specialised graph database such as
        Neo4j from as much 326x (with a SQL query) to as low as 6x (when
        using Crackle). We also include a detailed performance analysis
        that identifies ways to further reduce this gap with Crackle.
        This brings into question the performance argument for
        specialised graph databases such as Neo4j by providing
        comparable performance on supposedly legacy data sources.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-819.html">
    <title>Guessing human-chosen secrets</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-819.pdf</link>
    <dc:creator>Bonneau, Joseph</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2012-05</dc:date>
    <description>
        Authenticating humans to computers remains a notable weak point
        in computer security despite decades of effort. Although the
        security research community has explored dozens of proposals for
        replacing or strengthening passwords, they appear likely to
        remain entrenched as the standard mechanism of human-computer
        authentication on the Internet for years to come. Even in the
        optimistic scenario of eliminating passwords from most of
        today's authentication protocols using trusted hardware devices
        or trusted servers to perform federated authentication,
        passwords will persist as a means of “last-mile” authentication
        between humans and these trusted single sign-on deputies.
        
        This dissertation studies the difficulty of guessing
        human-chosen secrets, introducing a sound mathematical framework
        modeling human choice as a skewed probability distribution. We
        introduce a new metric, alpha-guesswork, which can accurately
        model the resistance of a distribution against all possible
        guessing attacks. We also study the statistical challenges of
        estimating this metric using empirical data sets which can be
        modeled as a large random sample from the underlying probability
        distribution.
        
        This framework is then used to evaluate several representative
        data sets from the most important categories of human-chosen
        secrets to provide reliable estimates of security against
        guessing attacks. This includes collecting the largest-ever
        corpus of user-chosen passwords, with nearly 70 million, the
        largest list of human names ever assembled for research, the
        largest data sets of real answers to personal knowledge
        questions and the first data published about human choice of
        banking PINs. This data provides reliable numbers for designing
        security systems and highlights universal limitations of
        human-chosen secrets.
    </description>
  </item>
  <item rdf:about="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-818.html">
    <title>New approaches to operating system security extensibility</title>
    <link>http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-818.pdf</link>
    <dc:creator>Watson, Robert N. M.</dc:creator>
    <dc:publisher>University of Cambridge, Computer Laboratory</dc:publisher>
    <dc:date>2012-04</dc:date>
    <description>
        This dissertation proposes new approaches to commodity computer
        operating system (OS) access control extensibility that address
        historic problems with concurrency and technology transfer.
        Access control extensibility addresses a lack of consensus on
        operating system policy model at a time when security
        requirements are in flux: OS vendors, anti-virus companies,
        firewall manufacturers, smart phone developers, and application
        writers require new tools to express policies tailored to their
        needs. By proposing principled approaches to access control
        extensibility, this work allows OS security to be “designed in”
        yet remain flexible in the face of diverse and changing
        requirements.
        
        I begin by analysing system call interposition, a popular
        extension technology used in security research and products, and
        reveal fundamental and readily exploited concurrency
        vulnerabilities. Motivated by these failures, I propose two
        security extension models: the TrustedBSD Mandatory Access
        Control (MAC) Framework, a flexible kernel access control
        extension framework for the FreeBSD kernel, and Capsicum,
        practical capabilities for UNIX.
        
        The MAC Framework, a research project I began before starting my
        PhD, allows policy modules to dynamically extend the kernel
        access control policy. The framework allows policies to
        integrate tightly with kernel synchronisation, avoiding race
        conditions inherent to system call interposition, as well as
        offering reduced development and technology transfer costs for
        new security policies. Over two chapters, I explore the
        framework itself, and its transfer to and use in several
        products: the open source FreeBSD operating system, nCircle’s
        enforcement appliances, and Apple's Mac OS X and iOS operating
        systems.
        
        Capsicum is a new application-centric capability security model
        extending POSIX. Capsicum targets application writers rather
        than system designers, reflecting a trend towards security-aware
        applications such as Google’s Chromium web browser, that map
        distributed security policies into often inadequate local
        primitives. I compare Capsicum with other sandboxing techniques,
        demonstrating improved performance, programmability, and
        security.
        
        This dissertation makes original contributions to challenging
        research problems in security and operating system design.
        Portions of this research have already had a significant impact
        on industry practice.
    </description>
  </item>
</rdf:RDF>
