Computing for the Future of the Planet (CFTFP)

Digital technology is becoming an indispensable and crucial component of our lives, society, and the environment. Computing for the Future of the Planet is a framework for computing in the context of problems facing the planet. It is currently the group's main theme, comprising of several projects which are found below.

The framework has a number of goals: an optimal digital infrastructure (energy efficiency, energy monitoring and controlling systems, better reuse of components), sensing and optimising with a global world model (real-time environment sensing and using of novel cost algorithms, global scale monitoring and analysis), reliably predicting and reacting to our environment (global scale computing for accurate predictions, better understanding of models and their validity and model correction), and digital alternatives to physical activities (new tools, environments and infrastructure for a greater shift to cyberspace regarding activities such as wealth creation and entertainment to reduce their impact on the physical world).

Google In Feb 2010, the CFTFP programme has been awarded a funding package from Google which provides 'unrestricted' grants for two to three years to fund the research across the above mentioned four areas. See press article for more.

 

Videos of talks given on the framework at Google and the University of Edinburgh are available. 

Wireless Communications

Wireless technology is always in the spotlight of data communications and is now also central in world-sensing applications. The group has been conducting fundamental research for digital transmission technology at the intersection of applied mathematics, computing and engineering — with a tradition of dealing with problems arising in the practical implementation of state-of-the-art techniques in wireless communications.

The research into wireless communications within the group encompasses the areas of physical and access control layers. It built upon our years of developing channel models and channel codes for single and multiple antenna communication, modulation and detection schemes, applying that knowledge and experience in newer areas like ultra-wideband systems, wireless sensor networks, cooperative networks, cross-layer schemes and channel models for infrastructure monitoring.

FRESCO: A Fabric For Reproducible Computation

We are currently accepting PhD applications for the FRESCO project.  Please contact Dr Rip Sohan if you're interested in applying.

The Problem

In the last twenty years civil society has become increasingly dependent on the use of computing technology for its maintenance on a daily basis as well as its longer term advancement. Computing technology is now embedded into the backbone of virtually all professional and personal aspects of society. In this regard, technology has been immensely useful for the provision of platforms for the collection, extraction, manipulation and presentation of data. This, in turn, has been fundamental in enabling (either directly or indirectly) the computation of a wide range of empirical and quantitative data.

Technology is particularly suited at supporting the computation of data for a number of reasons: it is simple and easy to automatically and reliably collect accurate and fine-grained data at large scale, it is possible to accurately classify, manipulate and perform computations on the collected data, it is possible to create very accurate virtual representations of real-world objects and behaviour and experiment with these representations in an empirical fashion and it enables quantitative reasoning of complexly interlinked and multi-variate information.

However, the use of technology as a platform for enabling the computation of quantitative and empirical data is not foolproof. In particular, there is rarely any mechanism for detecting or preventing the use of incorrect data in computations. The consequences of this weakness can be serious. For example, in July 2009 Northern Ireland Water sent out 7800 incorrect non-domestic water bills based on wrong data costing the company £ 250,000 in lost revenue.

Similarly, incorrect or inaccurate computations may lead to the generation of incorrect or inaccurate results, the repercussions of which can be pronounced. For example, a 2007 report by the Center For Economic and Public Policy in Washington D.C. found evidence that between the years of 2000--2002 faulty economic analysis by the IMF caused it to consistently overestimate Argentina's GDP growth. Consequently it inaccurately considered the country a safe risk and had lent it billions of dollars when the country economically collapsed in early 2002.

While the consequences of utilising incorrect or inaccurate data or computation results is highly dependent on the problem domain and the manner in which the data or results are used, it is important to note that there are two trends that suggest that, generally, the consequences are likely to increase in severity in the future.

Firstly there is the growing movement of open data. All over the world governments and private companies are releasing more and more (archived and real-time) datasets to the public in diverse areas. Both individuals and corporations are beginning to rely on these datasets to provide useful services. Data-heavy domains such as scientific computing are moving towards models where both source and computed data is made available as evidence of correctness, justification for decisions, elucidation and for others to use and extend. As this trend gains momentum we expect to see increased reliance on both original and derived datasets. Incorrect or inaccurate datasets as a result of erroneous computations in this situation are likely to result in far reaching negative effects, especially where the services are used by or affect a large number of people.

Secondly there is a growing movement towards highly linked data where datasets are directly connected to or indirectly derived from multiple datasets. In this situation data errors in one dataset can affect many others. Similarly, computational errors in one stage of a data derivation pipeline may result in derived data errors many stages later. As data becomes increasingly linked it is important that we are able to trace the origins of data both from the perspective of what source data was involved in its computation and how the source data was transformed to create it in order to be able to quickly and accurately identify and correct any errors.

Even in scenarios where data provenance is unimportant, computational provenance is beneficial in enabling users to reason about and trust results. For example, there are increasing calls to base federal and national policy decisions on empirical data in the interests of increased government transparency and better informed citizens. In this case citizens would benefit from computation provenance as it enables them to confirm the decision making process or, if necessary, refute it with evidence.

FRESCO

The Fabric For Reproducible Computing (FRESCO) project aims to provide Operating System level support for computational and data provenance in a general purpose UNIX (Linux) system.

The overall goal of the project is to allow users to query and archive the entire computational workflow resulting in the creation or change of any file on the machine and to export the archived workflow off-host so that the result can be independently reproduced on another system.

FRESCO provides a tool for creating FRESCO archives that can be independently replayed on another host. FRESCO archives are self-contained files that contain the workflow description, application binaries, shared libraries and input files necessary to reproduce the result on another host. FRESCO achieves this goal by: (1) tracking the execution of applications on the system (2) the effects of application execution on file-level filesystem changes and (3) the application binaries, system libraries and non-deterministic inputs and events necessary to reproduce these changes independently off-host.

FRESCO also provides support for more advanced workflow manipulation. It is possible to modify workflows to introduce or delete intermediate steps, change input data or non-deterministic events and branch and merge workflows.

Currently, the FRESCO project has five major themes:

  1. Completeness: Our aim is to create a general-purpose tool suitable for use on a standard kernel and which does not require application specialisation. We are striving to create a platform that is easily usable in any application domains and that can be readily deployed on a general purpose Linux system without requiring expert understanding. To that end we are open-sourcing our efforts for adoption by the wider community.
  2. Performance: In order for FRESCO to be considered as a standard system component it is important that it does not significantly slow down application execution. To this end we are concentrating our efforts on minimising the computational and data storage overheads of the platform
  3. Distributed Systems: Large computations are commonly spread across multiple hosts some of which may not be controlled by FRESCO. This theme of the project aims to optimise the behaviour of FRESCO across multiple distributed hosts so that users are able to reproduce workflows created across multiple hosts.
  4. Hardware Support: The hardware theme seeks to identify and explore changes in hardware useful for minimising the overhead of FRESCO as outlined in the Performance theme.
  5. Verification & Validation: The final theme in the project aims to explore the theoretical aspects of computational and data provenance. We aim to develop techniques that allow us to determine why differences in workflow replay arise and auto-correct them if possible.
New approaches to programming in the sciences

Programming languages provide an interface for developing increasingly complex models in science. However, as computer models grow more complex, it is increasingly difficult to deliver on core requirements such as verifiability, maintainability, understandability, validity, and portability.

Managing software complexity more effectively has been a focus of programming language research for many years, yet we see little adoption of new approaches in the natural sciences. Instead we see scientists continually striving to evolve their software to more complex models, or bigger data sets or novel execution architectures.

We are running an multidisciplinary project involving computer scientists and natural scientists to understand how state-of-the-art programming language research can be leveraged for more effective programming in the sciences.

Nigori - Secrets in the cloud

The aim of the Nigori project is to develop a pratical, application neutral, peer-reviewed mechanism for storing sensitive user data in the cloud in such away that the cloud provider and application developer cannot read any of the stored data. Nigori handles the synchronisation of data in a distributed system of devices using untrusted cloud services and local reconciliation using application developer specified logic.

There are (or will be) implementations in Java, Python, Dart and OCaml allowing support for many platforms for clients and servers, in particular android smartphones and desktop browser extensions.

This project is funded by Google.

Cross-Layer Wireless Schemes

The current wireless trend raises issues that call for thorough investigation and improved designs in channel access and contention mechanisms. Our interests lie in scalability and efficiency as well as low-power and reliable QoS for challenging scenarios. We are interested in the design, modelling and implementation of wireless schemes capable of coping with large and dense networks, and rapid topology changes while striving for low-complexity.

As part of this research, we developed Multi-Carrier Burst Contention (MCBC), a new class of cross-layer wireless protocols which use randomized and multi-dimensional contention schemes. It exploits both the frequency and time domains, mapping contention onto random OFDM subcarriers, and employs custom shaped energy bursts.

Microsoft Research The MCBC project has been awarded by Microsoft Research as part of the competition Microsoft Research Software Radio Academic Program 2010. The grant, in the form of SDR hardware equipment, will allow the implementation of fully-featured MCBC prototypes and advance research into multi-carrier techniques.

Sensing for Sport And Managed Exercise

The SESAME consortium is a multidisciplinary group formed to investigate the use of wireless sensor-based systems (e.g. on-body, pressure, trajectory aid etc.) with offline and real-time processing and feedback (e.g. high speed and infra-red cameras, synchronization mechanisms, system-on-chip etc.) in enhancing the performance of elite athletes and young athletes who have been identified as having world class potential.

The overall goals of the project lie in enhancing performance, improving coach education, and advancing sports science using a range of both hardware and software technologies to achieve this. In so doing, we will build on the extensive experience that exists both within and outside the consortium in the application of sensor systems to human and animal monitoring, and we will seek to advance that knowledge both in terms of outcomes that are specific to sports and in terms of computer science fundamentals.

Cooperative Wireless Networks

Conventionally, wireless communications share limited resources through competition and allocation.  One of the major challenges faced is the interference which arises from the same signal travelling through multiple different propagation paths.  Instead of increasing the resilience of communication to combat this effect, our work investigates how resource cooperation between users can exploit and benefit from the multipath effects.  Our research field is known as cooperative or collaborative communications, where users act as relays to assist other users or more simply: multiple relays assist a user in transmission.  Our investigation aims to optimize the performance of multi-user systems employing high performance codes, i.e. block and turbo codes.  We believe our work offers significant contributions to existing work which focuses primarily on pure relay networks and multi-user networks employing simple coding schemes.

Indoors localisation and tracking

Location information is an important source of context for ubiquitous computing systems. Today GPS allows a device to locate itself to within 25m in an outdoor environment, but indoor localisation remains an open research problem. Researchers have yet to develop a system which can be easily and cheaply deployed in a large building, whilst still providing accurate localisation.

The group has a long history of research and also development of indoor localisation systems, e.g. the Bat system based on ultrasound, a wearable indoor localisation system based on inertial measurements and particle filtering, trolley tracking in stores using cameras etc. We're currently expanding our research in indoor localisation using RF signals coupled with opportunistic signals (which exist around us inherently), Software Defined Radio and also sensors on the mobile phones.

OpenRoomMap

In order to collect a detailed indoor plan of buildings, we think that this information is best entered and kept up to date by building users themselves. OpenRoomMap is a system that allows users to annotate building floor maps with fine-grain positioned furniture, appliances and room features, in a user-friendly graphical user interface. It can also be used to easily locate and visualize people's offices.

OpenRoomMap is deployed in the Computer Laboratory building, where users have populated the floor maps with various annotations: the names of the floorboxes in offices, the location of air vents, the name and position of workstations, position of desks, shelving and appliances and others.

CSK Energy

The Cambridge Sensor Kit (CSK) is an pervasive platform for energy management, research and development that was implemented on top of the UCAM-WSB100 wireless sensor board, itself developed within the group. As part of the CFTFP framework, it is used to quantify the effective and wasteful energy usage in order to improve effectiveness by performing fine-grained energy monitoring (across housing/university), consumption modeling and providing energy feedback to users, consumption comparison and display and energy control tools. The first deployment of the system was in May 2009 within the lab's building. Since then it has been in continuous operation and has been expanded into households. The CSK wireless subsystem has been released as an open-source hardware platform.

The CSK was initially used to support the experimental work of SESAME, an UK-funded project that investigates the use of sensor systems (in particular on-body wireless sensors) with off-line and real-time processing and feedback in enhancing the performance of elite athletes.

Automated Assessment of Kinaesthetic Performance in Rowing

Sentient or context-aware computers can sense and interpret their environment in order to act on or disseminate an "understanding" of what they perceive and optimise or allow us to optimise our behavior. This work, affiliated with the EPSRC SESAME project, addresses this within the domain of rowing.

A system has been developed that helps an athlete maintain a consistently good technique by providing real-time and post-workout feedback on aspects of their technique by non-invasively sensing their movements as the row using an ergometer. The dataset collected is used to evaluate algorithms for assessing the similarity of two performances and supplementing coaches' verbal feedback by recognising common faults in technique.

TRVE DATA: Placing a bit less trust in the cloud

Cloud and software-as-a-service applications such as Google Docs, Evernote, iCloud and Dropbox are very convenient for users, but problematic from a security point of view. As these services process data in unencrypted form on their servers, users must blindly trust the cloud provider to prevent unauthorised access and to maintain integrity of the data. A security breach of the cloud provider could have disastrous consequences.

In this project, we are exploring techniques for Trust-Reducing Verifiable Exchange of data (TRVE DATA, pronounced "true data"). Our goal is to create the foundations for applications that are as usable and convenient as today's cloud services, while reducing the amount of trust that is placed in third parties.