Computer Laboratory

CST Part IB group projects

Proposed design briefs for 2012

We expect 12 student teams in 2012, and we are offering 14 different project/client combinations, to provide some contingency and matching to available student skills/interests.


Proposed Project Briefs for 2012

Douglas Squirrel, TIM Group - douglas.squirrel(at)timgroup.com
Top Tips Two

Every day, thousands of the world's best stock pickers write to favoured clients with their best buy or sell tips of the day. You'll analyse a large subset of this trade-idea commentary, determine whether some or all of the supplied attributes are linked to positive or negative performance, and write a ranking program that predicts the quality of a trade idea based on appropriate data elements. If you find a signal, we'll test your hypotheses by running your code against our full corpus of nearly two million ideas and work with you to refine as needed. And if it's sufficiently predictive, we will look to incorporate your algorithm into our online trade-idea system to rate ideas in real time for brokers and fund managers.

To work on this project you will need to process old but still somewhat sensitive data and will thus have to be willing to sign a confidentiality agreement and verify destruction of data on completion.

Tim Scammell ARM - tim.scammell(at)arm.com
Data Centre Footprint

ARM are planning a major data centre project. This will be highly instrumented for power consumption. We want to understand the carbon footprint that this equipment creates and the costs that this generates and then provide this data (both quantitative and qualitative) to both our internal management and engineers on a regular / real-time basis. We want this material to be sensitive to even small changes in our environment and allow us to show which components are the best / worst environmentally. We want to do this now and build up a large pool of data which we can then use to contrast against future modifications to the configuration.

Tim Scammell ARM - tim.scammell(at)arm.com
Frigid Energy

Refrigerators are arguably the least sophisticated appliance in most houses and a big energy consumer. This project will create an 'energy signature' that measures instantaneous power consumption to infer what the thermostat setting is, whether the door is being opened and closed too much, or whether it would be sensible to buy a new fridge. In the summer time, fridges use more power than any other device in the house, so some digital optimisation could provide huge benefits.

Tim Scammell ARM - tim.scammell(at)arm.com
Integrated Diary Scheduling

Despite advances in diary systems it is still difficult to automate scheduling of meetings. Even an application that made an initial suggestion to a Doodle-like system would be an improvement on the current situation. This project will build an application that aids the process of meeting schedules and can include specifying essential and desirable participation, deadline for a meeting etc. This project will involve some Android development and interaction with Exchange diaries.

Steve Smith - steve.smith(at)admin.cam.ac.uk
Mobile CamSIS

This project offers the opportunity to inform future mobile information services at the University. The work will involve the creation of a flexible and adaptable framework that could be used to deliver key University information to a student's mobile phone. The current University systems are primarily tailored for desktop/tablet access, and do not necessarily scale well to smaller form factors such as mobile phones. To support the project team, access can be given to Exam timetables and anonymised Grade information from CamSIS via Web Services, but we would also ask them to consider what other centrally held information would be useful for students to retrieve on a mobile phone. The intention is not to simply replicate or replace the breadth and depth of functionality in the central applications, but to identify and provide easy access to the information of most benefit to the student. The team may also investigate whether functionality not currently available, such as SMS alerts, or directions to University buildings should be included. The framework would need to allow other information sources to be easily incorporated, to support the main handsets in use by students at the University, and provide the relevant level of security controls to ensure that sensitive information remains secure.

James Belsey, Morgan Stanley - James.Belsey(at)morganstanley.com
Improve the speed of the Scala compiler

The Scala compiler is slow to convert from Scala code to an execution. We would like a team to work on improving the compile speed. Since the Scala compiler is written in Scala itself there are 2 ways in which the compile speed can improve. 1st the compiler code itself can be made more efficient, 2nd the actual execution of Scala code can be made more efficient. We would like the students to consider the scope of this unbounded. Whatever can be done to reduce the time it takes to get things to run would be considered beneficial.

Limo Hearn, Morgan Stanley - Limo.Hearn(at)morganstanley.com
"IntelliTrade" Derivatives Trade Capture

Making sure that Traders enter their trades quickly and accurately is critical to Morgan Stanley: without an error-free view of what we've traded and with whom, we are unable to manage our risk. Unfortunately capturing a trade in an Equity Derivative can be a complex business and to this end, we would like to investigate heuristic methods for helping traders to capture their trades correctly and efficiently.

Specifically, as traders enter trades into their trade capture system, we would like to be able to prompt them with the values they are statistically most likely to choose as they enter certain key fields. This implies the development of algorithms or statistical techniques for analysing historic trade data, and the ability to display the results of this analysis to the users of the trade capture systems in real-time.

While this is a piloting exercise, any solution would need to have the potential for extremely high performance. We would also require it to be general-purpose and easily adaptable to other trade capture systems within the Firm. Finally, although we would look to employ Open Source technology within the solution wherever possible, we would expect any server technology developed as part of the project to be written in Scala and GUI technology to be written in .NET/WPF.

Matt Johnson, Morgan Stanley - Matt.W.Johnson(at)morganstanley.com
Finding failures before they happen: detecting anomalies in large volumes of timeseries data

We collect performance data statistics every five minutes from disk storage arrays used throughout the Morgan Stanley technology plant into a large globally distributed database. This is around 1,500 arrays, each array having multiple controllers, tens of RAID groups, hundreds of LUNs, and hundreds of disks. While it is possible to easily trace back a problem once the symptoms it has caused have been reported, it should also be possible to analyze the normal behaviour of each of these components and determine heuristics which indicate 'out of character' behaviour which warrants investigation and proactive response. This is complicated by the fact that 'normal' will vary based on time (is it during the trading day, after the trading day, at the weekend, at month end...?), and potentially other business conditions (e.g. high trading volumes). We would be able to provide a suitably anonymized set of data to sample. The aim of the project is to investigate and implement an algorithm or set of algorithms which can examine this data and spot anomalous events, evaluate its effectiveness, and discuss other potential approaches to the problem.

Ada Ng, Morgan Stanley - Ada.Ng(at)morganstanley.com
Rebalancing logic for Cloud platform management

Cloud computing offers us a number of advantages. Two of these are elasticity and resource rebalancing. Both require making placement or resource allocation decisions based on capacity metrics and usage patterns. Companies like Amazon and Google app engine all have their own computation logic and tooling to offer these features to customers.

This project will build a rebalancing evaluation tool which can take in various metrics, such as CPU, memory, network I/O etc, of the hosts as well as of application oriented metrics such as throughput, number of users, message sizes etc, to deduce the most efficienct placement decision. The system should also allow the weighting of each metric to be tuneable, for example, memory could be more important to one type of application than CPU while for another, the number of users accessing the application can have a high impact on resource use. In the enterprise environment, multiple instances of the same applications would be deployed on different hosts in different regions for high availability and load balancing. The system should therefore taken into account the need to re-balance the placement of application in groups based on factors such as location.

The three typical use cases are : 1. System to suggest which hosts a new application should be deployed on based on current free capacity of each host in the plant; 2. Producing system capacity metrics of the plant over the last month, generating reports on how the plant should be re-balanced and how applications should be distributed across the plant; 3. Detecting that the memory consumption of a particular container in a web farm has been increasing steadily over the last month due to increase number of traffic and will in the near future exceed available resource, hence the container should be moved to another set of hosts.

John Baing, Cicso - jbain(at)cisco.com
Online Programming Testing

In screening applicants for software development jobs, there is a need to mark questions where candidates write code to solve a problem. These are currently marked manually. We would like a system in which answers can be marked automatically.

This would have a completely new simple programming language (so that candidates are not advantaged by familiarity) which runs on a virtual machine which can be used to test understanding of fundamenal computing issues. The language should be similar in abstraction level to 'C' and simple enough to learn in a few minutes.

The end product will be a web-based testing system which i) explains the VM and language to a candidate; ii) allows a tester to set problems with descriptive text, a definition of the initial and final states of the VM and any required outputs; iii) allows candidates to enter his or her solution; and iv) automatically marks the solution.

Simon Rous Credit Suisse (Simon.Rous(at)credit-suisse.com)
Personal Photograph Organisation and Quality Scoring

Many people take far too many photographs. The task is to develop a tool that helps select the ones worth keeping. Similar photographs could be grouped e.g. by time and place information, if available, or by content recognition. Photographs could be scored either by a user or automatically (contrast, focus, composition, etc.) The system should make it simple for the user to approve the proposed categorisation of photos.

Peter Cowley Camdata (Peter.Cowley(at)camdata.co.uk)
Modern Media at May Balls

The combination of smart phones with near field communication (NFC) and weather proof radio frequenccy identification (RFID) technology would provide an interesting opportunity for May Balls in terms of both security and interfacing to social media. The project would produce software for an Android phone with NFC and a server to: (i) allow authorised entry to a May Ball; (ii) to aid roving security guards (detecting gate crashers); (iii) allowing Facebook-like activities/locations; and (iv) to upload real-time phones to both the May Ball site and Facebook.

You should approach Ball committees to assess the market and hooks should be provided to a Ball ticketing system and to Twitter feeds.

Rob Mullins (pro tem) Raspberry Pi (Robert.Mullins(at)cl.cam.ac.uk)
Collaborative Programming for CS Education

The Raspberry Pi is a very low cost computer designed for use as an educational tool. It's price point of ~£30 is likely to have a major impact on both education and reducing the 'digital divide'. What is required is an environment and a number of programming challenges for children to solve collectively through programming/scripting. The Cs4fun website is an excellent source for ideas. You might want to factor in design decisions that would make the system usable in the developing world, in particular without expert assistance.

Dominic Nancekiewill Gloucester Research (Dominic.Nancekievill@gresearch.co.uk)
Trading and trading anomoly visualisaiton

In today's world of high frequency trading, understanding the rapidly changing quotes from a myriad of trading algorithms is a colossal data processing problem. With tens of thousands of updates per second in a single market, visualising this data to better hone these algorithms provides a key competitive advantage. Given a day of actual stock exchange data, your challenge is to create a server to process and aggregate the data, and a set of clients to run on both mobile and desktop platforms. The clients will provide both static and real-time visualisations highlighting interesting features such as rolling averages, peaks of activity and unusual patterns suggesting competitor behaviour.