Department of Computer Science and Technology

Technical reports

Debugging concurrent and distributed programs

Robert Charles Beaumont Cooper

February 1988, 111 pages

This technical report is based on a dissertation submitted December 1987 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Churchill College.

DOI: 10.48456/tr-128

Abstract

This thesis aims to make one aspect of distributed programming easier: debugging. The principles for designing and implementing an interactive debugger for concurrent and distributed programs are presented. These programs are written in a high-level language with type-checked remote procedure calls. They execute on the nodes of a local computer network and interact with the other programs and services which exist on such a network.

The emphasis is on debugging programs in the environment in which they will eventually operate, rather than some simulated environment oriented specifically to the needs of debugging. Thus the debugging facilities impose a low overhead on the program and may be activated at any time.

Ideally the actions of the debugger should be transparent to the execution of the program being debugged. The difficult problem of avoiding any alteration to the relative ordering of inter-process events is examined in detail. A method of breakpointing a distributed computation is presented which achieves a high degree of transparency in the face of arbitary process interactions through shared memory.

The problems of debugging programs that interact with network services, which are shared concurrently with other users of the distributed environment, are examined. A range of debugging techniques, some of which are directly supported by the debugger, are discussed.

A set of facilities for debugging remote procedure calls is presented, and the functions required of the operating system kernel and runtime system to support debugging are also discussed. A distributed debugger is itself an example of a distributed program and so issues such as functional distribution and authentication are addressed.

These ideas have been implemented in Pilgrim, a debugger for Concurrent CLU programs running under the Mayflower supervisor within the Cambridge Distributed Computing System.

Full text

PDF (5.9 MB)

BibTeX record

@TechReport{UCAM-CL-TR-128,
  author =	 {Cooper, Robert Charles Beaumont},
  title = 	 {{Debugging concurrent and distributed programs}},
  year = 	 1988,
  month = 	 feb,
  url = 	 {https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-128.pdf},
  institution =  {University of Cambridge, Computer Laboratory},
  doi = 	 {10.48456/tr-128},
  number = 	 {UCAM-CL-TR-128}
}