[[TracNav(TOC)]] = Programming Large-Scale Networks = == People == * [http://www.cl.cam.ac.uk/users/jjd27/ Jonathan Davies] * [http://www.cl.cam.ac.uk/~arb33/ Alastair Beresford] * [http://www.cl.cam.ac.uk/~am21/ Alan Mycroft] == Introduction == When building wireless sensor networks today, a programmer will typically write code for each individual sensor node separately. This approach to systems building results in early physical binding since the programmer must decide at design-time the places at which sensor data is processed. This can lead to poor utilisation of resources, such as increased compute time, wasted network bandwidth or poor battery life, particularly if later changes to the architecture or application render early decisions about where certain code should be executed inappropriate. Our research aims to separate application code from the set of resources which will eventually execute it. This allows late physical binding of the application since the components of the application can be optimised for execution on available resources after the software has been written or modified. [[Image(scrn2-small.gif, align=right, nolink)]] Our approach can be explained with the aid of a simple example. One common calculation in sensor networks is to find the arithmetic mean of a large number of sensor readings. The centralised approach would gather and sum all the readings at the node where the mean is required and divide by the number of readings received. Partitioning the problem and calculating an average in a distributed fashion across the sensor network, as the data is collected, means that we can reach an answer more quickly (due to parallelisation) and using less energy (due to a lower volume of network traffic) as several additions are executed in parallel. This research was inspired by our [wiki:SentientVehicle Sentient Vehicles project]: In a large-scale network of vehicles containing general-purpose processors and communicating with each other and with the Internet, what is the best way to organise the execution of an application involving the collection, processing and dissemination of sensor data? == Language and Compiler == We have defined a language built on top of Java in which program tasks are defined. We have added some syntactic sugar to Java classes to enable programmers to identify program tasks and the relationships between them to the compiler. Various features of the language are provided for the compiler to determine whether certain program transformations, which can improve the efficacy of an assignment function, are safe to apply. These transformations enable the program to be better-suited to execution in a particular network of processors. An example snippet of code in our language is given below, taken from the application which computes the average of a set of sensor values. {{{ mergeable datatype PartialAv { private double numer; private int denom; public PartialAv() [cpu=0, out size=1] { this(0, 0); } public PartialAv(double numer, int denom) { this.numer = numer; this.denom = denom; } public PartialAv merge(PartialAv a) [cpu=1, out size=sum] { return new PartialAv(this.numer + a.numer, this.denom + a.denom); } processto Average [cpu=1, out size=1] { return new Average(this.numer / this.denom); } } }}} In our current implementation we use [http://www.cs.cornell.edu/Projects/polyglot/ Polyglot] to compile a program written in our language into standard Java source code. The Java source code is then compiled using a conventional Java compiler, producing a jar file or MIDlet for each processor. [[Image(spots-small.jpg, align=right, width=250, nolink)]] == Test Bed == In order to examine and evaluate our ideas in a real-world network, we employ a wireless sensor network of [http://www.sunspotworld.com/ Sun SPOT] devices. We are very grateful to Sun for granting the SPOT development kits to us for use in our research. For a given data-processing application, our compiler produces MIDlets to execute on the available SPOTs which sample the built-in sensors and pass data between themselves over the radio channel. == Publications == Publications arising from our research into programming large-scale networks are listed below; most recent first. === Language-Based Optimisation of Sensor-Driven Distributed Computing Applications === {{{ #!html Jonathan J. Davies, Alastair R. Beresford, Alan Mycroft
11th International Conference on Fundamental Approaches to Software Engineering (FASE 2008), LNCS 4961, pp 407--422. Springer-Verlag, March 2008. }}} In many distributed computing paradigms, especially sensor networks and ubiquitous computing but also grid computing and web services, programmers commonly tie their application to a particular set of processors. This can lead to poor utilisation of resources causing increased compute time, wasted network bandwidth or poor battery life, particularly if later changes to the architecture or application render early decisions inappropriate. This paper describes a system which separates application code from the description of the resources available to execute it. Our framework and prototype compiler determines the best location to execute different parts of the distributed application. In addition, our language encourages the programmer to structure data, and the operations performed on it, as monoids and monoid homomorphisms. This approach enables the compiler to apply particular program transformations in a semantically-safe way, and therefore further increase the flexibility of the assignment of application tasks to available resources. * [http://www.cl.cam.ac.uk/Research/DTG/publications/public/jjd27/fase_language-based-optimisation_davies_camera-ready2.pdf PDF (173 KB)] === Scalable Inter-Vehicular Applications === {{{ #!html Jonathan J. Davies, Alastair R. Beresford
On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops (Part II). LNCS 4806, pp 876--885. Springer-Verlag, November 2007. }}} Many pervasive inter-vehicular applications involve the collation, processing and summarisation of sensor data originating from vehicles. When and where such processing takes place is an explicit design-stage decision. Often some processing occurs on vehicles, and some on back-end servers, but it is hard for the programmer to optimise this distribution for feasibility or performance. This paper investigates automated task assignment: we define a computational model which captures data aggregation and summarisation explicitly, allowing a compiler to automatically optimise the assignment of processing tasks to particular vehicles and servers. Our model allows a compiler to apply program transformations to data processing, which can further improve task assignment. * [http://www.cl.cam.ac.uk/Research/DTG/publications/public/jjd27/scalable-vehicular-apps.pdf PDF (273 KB)]