Plan-A

*Plan A*
"A coherent virtual IO system for a distributed computer."

Contents

Plan A

NetOS

About the Project
People
Information for members
Relevant Papers and Links.

Introduction

Plan A

NetOS

The Plan A project will define an architecture and perform a reference implementation of a coherent virtual IO system for a distributed computer. A distributed computer in the most general form is a collection of processing, storage and user IO peripherals interconnected by communication system(s) that supplies the user and administrator with a single system view. The architecture to be developed will aim to exhibit predictability and scalability, together with tollerance to faults, latency and heterogeneity.

Background

Plan A

NetOS

The goal of building a distributed computer is not a new one (our italics):

"When the Cambridge Digital Communication Ring hardware was satisfactorily commissioned in 1976-7, it seemed the obvious thing to set up a computer system based on it and exploiting its characteristics. The main intention originally had been to promote peripheral sharing using the Ring. But with the advent of inexpensive microcomputers and the ready availability of minicomputers we were stimulated to consider the use of interconnected machines constituting a coherent system rather than just a collection."

It is important to note that these authors were dealing with a very mixed bag of processing hardware (LSI4, 68K, Z80, CAP) and peripherals (printer, disc, terminals, audio).

Since those heady days, several distinct lines of research have emerged in the general area, e.g., multiprocessing, parallel computing and distributed computing. However, none have really come close to meeting the original goal: multiprocessing assumes a degree of homogeneity in processing and interconnect bandwidth that is often not possible or desirable; parallel computing has concentrated on the application level problem; distributed computing has concentrated on tying systems together at quite a high level.

Our goal is to design a communications architecture to tie together a heterogeneous collection of systems within a single administration domain to supply virtual device and processor abstractions that can meet the needs of future multi-user operating systems. The architecture should serve the communication and synchronization needs of both processes (for some definition of process) and devices; be latency tolerant, scalable and offer predictable performance; enable the effective use of high performance message passing and coherent memory hardware structures as well as work in switched LAN environments; be secure.

The goal eventually is to use it as a platform for a "single system image" operating system; that is the users and administrators of the system view the distributed collection of hardware as a single computer. Furthermore one test of a successful system would be the ability to simultaneously supporting multiple OS instances.

Starting Points

Plan A

NetOS

There are several starting points for the project that can be identified.

At the most fundamental level is the hardware. The developments of various cache coherence implementations, direct access network interfaces, network discs, systems area networks etc are already well along, while initiatives in secure hardware are in the early stages. Part of the work is to understand how the I2O2 platform would influence the future development of these components based on the desire for predictable performance, latency tolerance and security.

Predictability; two axioms apply: hardware does not work and (some) saturated resources are the norm. Failure: in building a system of many components that can undergo independent failure, we need to increase the reliability rather than decrease it. The correct thing then is to instigate some systematic approach to failure in the design and implementation to supply some predictable system behaviour. We wish to include some reasonable analysis of the failure properties of the system under hardware component failure. Utopia would be language support for probabilistic model checking implemented in a suitable tool that would automate of this process; more realistically we might perform the analysis in a more ad hoc manner and use the experience gained to inform us on future language design. Performance: bringing to bear expertise in parallel computation is desirable in order that applications can be effectively mapped to the system (both at build and run time) and perform within some expected parameters. This analysis overlaps with the failure considerations, as we would wish to understand the behaviour of the system when it is limping as well as fully functional.

Latency tolerance: the increasing discrepancy between core processor latencies and main memory (or remote memory in a cluster) must be addressed. Various distribution techniques are singularly inappropriate. Considering RPC, or to be trendy Remote Method Invocation, they enforce what is a language construct upon the system communications architecture at the expense of performance. On the other hand moving to per-problem asynchronous communication is anarchy. Rather, we need an architecture that involves an economy of concepts (to keep the programmers sane) and follows the age- old discipline of separation of control and data. Relating this to the performance work enables a well- informed engineering decision on where the control/data line is drawn.

Heterogeneity may be attacked at several levels: today OS's supply a degree of hardware independence through defined APIs for each class of device (a block device for filesystems) and/or "open" protocols for communications at some level (e.g. SCSI, USB...). We might aim to achieve processor independence through a virtual machine, with JIT compilation techniques to recover performance. A question to be asked within this project is the degree to which it is sensible to map the low level IO primitives as virtual machine operations and perform JIT compilation for the to generate native hardware interactions with devices.

People

Plan A

NetOS

Computer Laboratory

Steve Hand	Ian Pratt	James Bulpin	Steve McKellar

Microsoft Research Cambridge


Paul Barham	Derek (Mac) McAuley	Jonathan Hardwick	Neil Stratford

Project Information

Plan A

NetOS

Information for project members is available on a separate page (includes mailing list information). Access to this page is password protected. Please contact James Bulpin <James.Bulpin@cl.cam.ac.uk> for the password.

Paper Dump

Plan A

NetOS

The repository of relevant papers and other materials can be found in the dump directory.

A survey of parallel discreet event simulators which we might be interested in using for some parts of the project may be found here.

There are also some DanOS papers available in both postscript and pdf form here.

Plan A

NetOS

James.Bulpin@cl.cam.ac.uk
$Id: index.html,v 1.10 2000/10/02 16:20:24 smh22 Exp $

Computer Laboratory

Microsoft Research Cambridge