Cloud Computing Research
Cloud computing has made large-scale distributed computation available to all. Previously, deployments were constrained by the costs of equipment, infrastructure, floorspace, maintenance, power and cooling; anyone can now lease a practically unlimited number of virtual machines (VMs) for ten cents per hour. For the same cost, it is possible to rent ten VMs for one hour, or one VM for ten hours.
In this research project, we are developing new algorithms for distributed computation that exploit the unique characteristics of compute clouds. The goal is to use compute clouds as on-demand supercomputers, but a key obstacle is dealing with the heterogeneity of shared resources. We previously developed spread-spectrum computation for dealing with heterogeneity in the wide-area, and we are now evaluating its usefulness on cloud resources such as Amazon EC2 and Windows Azure.
When developing these algorithms, it is crucial to evaluate their performance improvement. However, the dynamic heterogeneity of cloud resources makes it difficult to reproduce experiments under the same conditions. We are left to question whether an improvement in performance is due to the new algorithm, or a sudden decrease in demand for the shared, bottleneck I/O resources. To address this, we are developing nephology, which provides a scientific method for predicting the performance of new cloud applications.
The nephological method for application performance prediction is as follows:
- Measure the I/O performance of several ensembles of cloud resources, to generate cloud traces.
- Characterise the observable behaviour of your application using an appropriate model (finite state machine, hidden Markov model, queuing model, etc.).
- Simulate the application model against the cloud traces to obtain end-to-end performance measurements.
We intend to use the above method to evalute our heterogeneity-aware algorithms. However, our approach has further applications:
- Reproducible experiments. By publishing the source code, application models and cloud traces, we enable other researchers to verify our evaluation results.
- Low-cost experiments. By simulating the application, the developer does not need to run it on real cloud resources (at first). This enables large-scale experiments, based on a large-scale trace, at little cost to the developer, and without placing heavy load on the cloud platform.
- Shared cloud traces. As more researchers gather cloud traces to evaluate their own research, the potential grows for a shared repository of these traces (similar to those used in other fields, e.g. CRAWDAD). A shared repository distributes the cost of gathering cloud traces, and the resulting larger corpus of traces can improve confidence in the simulated results.
Publications
We will add publications, technical reports and other material here as they are written: