An introduction to the work is available as a poster we presented at the EPSRC e-Science workshop.
Developers of e-Science software face a particularly harsh programming
environment. Systems are built from heterogeneous collections of
machines, connected over wide-area network links and often maintained
under separate management. If this complexity is to be invisible at
the point of use by programmers then support is needed for the
complete software development cycle, including compilation, debugging
and profiling in addition to job control and run-time middleware.
In this project we are investigating techniques for software
debugging. We focus on two areas which have received little
attention from the Computer Science community. The first area is
controlling complex multi-process applications through a single
cohesive debugging interface. We can do this by virtualizing the
resources used by the system, thereby allowing the threads that it
involves and the network links that it uses to be modelled as a single
controllable entity. This method will be applicable for moderately
sized systems of perhaps half a dozen nodes.
The second area is post-deployment debugging of very large-scale
distributed applications -- for instance those running over hundreds
or thousands of nodes. In such a setting traditional distributed
debugging techniques of checkpointing or simulation become infeasible.
This work is supported by the EPSRC grant Pervasive debugging
and by an Eclipse
Innovation Grant from IBM.
Dependable computing needs pervasive debugging
Proceedings of the 2002 ACM SIGOPS European Workshop
8th CaberNet Radicals Workshop
Ajaccio, Corsica, October 2003