Computer Laboratory

Course pages 2012–13

Concurrent and Distributed Systems

Principal lecturer: Dr Robert Watson
Additional lecturer: Dr Anil Madhavapeddy
Taken by: Part IB
Past exam questions: Concurrent and Distributed Systems, Concurrent and Distributed Systems, Concurrent Systems
Information for supervisors (contact lecturer for access permission)

No. of lectures: 16 (Continued in Lent Term)
Suggested hours of supervisions: 4
Prerequisite courses: Operating Systems, Programming in Java
This course is a pre-requisite for Mobile and Sensor Systems (Part II).

Aims of the Michaelmas Term part of the course

The aim of the course is to introduce concurrency control and distribution concepts and their implications for system design and implementation.

Michaelmas Term Lectures (Concurrency)

  • Introduction; thread models. Overview of properties of distributed and concurrent systems. Software system structure. Occurrence of concurrency in systems. Recap of scheduling and pre-emption. Thread models.

  • Classical concurrency control. Shared data and critical regions. Mutual exclusion and condition synchronisation. Semaphores. Implementation of concurrency control.

  • Classical problems using semaphores. Bounded cyclic buffer (producer(s) and consumer(s)), multiple readers and writers. Problems arising in semaphore programming.

  • Concurrency support in programming languages. Shared data: monitors, pthreads, Java. No shared data: occam, Ada active objects, Erlang, Kilim, tuple spaces. Lock-free programming.

  • Concurrent composite operations. Composite operations in main memory and persistent memory. Dynamic resources allocation and deadlock. Dining philosophers program. Deadlock detection and avoidance.

  • Transactions. ACID properties. Concurrency control and crash recovery. Definition of conflicting operations. Serialisation. Cascading aborts.

  • Database concurrency control. Pessimistic concurrency control: two-phase locking, timestamp ordering. Optimistic concurrency control.

  • Database recovery and summary of “Concurrency”. Write ahead log, undo/redo. Points to take forward.

Objectives

At the end of the course students should

  • understand the need for concurrency control in operating systems and applications, both mutual exclusion and condition synchronisation;

  • understand how multi-threading can be supported and the implications of different approaches;

  • be familiar with the support offered by various programming languages for concurrency control and be able to judge the scope, performance implications and possible applications of the various approaches;

  • be aware that dynamic resource allocation can lead to deadlock

  • understand the concept of transaction; the properties of transactions, how concurrency control can be assured and how transactions can be distributed;

  • understand the fundamental properties of distributed systems and their implications for system design;

  • understand the effects of large scale on the provision of fundamental services and the tradeoffs arising from scale;

  • be familiar with a range of distributed algorithms.

Recommended reading

* Bacon, J. & Harris, T. (2003). Operating systems: distributed and concurrent software design. Addison-Wesley.
Bacon, J. (1997). Concurrent Systems. Addison-Wesley.
Tanenbaum, A.S. & van Steen, M. (2002). Distributed systems. Prentice Hall.
Coulouris, G.F., Dollimore, J.B. & Kindberg, T. (2005, 2001). Distributed systems, concepts and design. Addison-Wesley (4th, 3rd eds.).

Aims of the Lent Term part of the course

The aims of this course are to study the fundamental characteristics of distributed systems, including their models and architectures; the implications for software design; some of the techniques that have been used to build them; and the resulting details of good distributed algorithms and applications.

Lent Term Lectures (Distributed Systems)

  • Introduction, Evolution, Architecture. Fundamental properties. Evolution from LANs. Introduction to the need for naming, authentication, policy specification and enforcement. Examples of multi-domain systems. Why things can get difficult quickly. Enough Erlang to understand subsequent examples.

  • Time and event ordering. Time, clocks and event ordering. Earth time, computer clocks, clock drift, clock synchronisation. Order imposed by inter-process communication. Timestamps point/interval. Event composition; uncertainty of ordering, failure and delay.

    Process groups: open/closed, structured/unstructured. Message delivery ordering: arrival order; causal order (vector clocks); total order. Physical causality from real-world examples.

  • Consistency and commitment. Strong and weak consistency. Replica management. Quorum assembly. Distributed transactions. Distributed concurrency control: two-phase locking, timestamp ordering. Atomic commitment; two-phase commit protocol. Distributed optimistic concurrency control and commitment.

    Some algorithm outlines: Election of a leader. Distributed mutual exclusion.

  • Middleware. Synchronous: RPC, object-orientated. Asynchronous: message orientated, publish/subscribe, peer-to-peer. Event-based systems. Examples of some simple distributed programs in Java and Erlang.

  • Naming and name services. Unique identifiers, pure and impure names. Name spaces, naming domains, name resolution. Large scale name services: DNS, X.500/LDAP, GNS. Use of replication. Consistency-availability tradeoffs. Design assumptions and future issues.

  • Access control for multi-domain distributed systems. Requirements from healthcare, police, emergency services, globally distributed companies. ACLs, capabilities, Role-Based Access Control (RBAC). Context aware access control. Examples: OASIS, CBCL OASIS, Microsoft Healthvault, ... Authentication and authorisation: Raven, Shibboleth, OpenID.

  • Distributed storage services. Summary and roundup. Network-based storage services. Naming and access control. Peer-to-peer protocols. Content distribution. Summary and roundup. Open problems for future years: transactional main memory; multicore concurrency control; untrusted components. Byzantine failure.

Objectives

At the end of the course students should

  • understand the need for concurrency control in operating systems and applications, both mutual exclusion and condition synchronisation;

  • understand how multi-threading can be supported and the implications of different approaches;

  • be familiar with the support offered by various programming languages for concurrency control and be able to judge the scope, performance implications and possible applications of the various approaches;

  • be aware that dynamic resource allocation can lead to deadlock;

  • understand the concept of transaction; the properties of transactions, how concurrency control can be assured and how transactions can be distributed;

  • understand the fundamental properties of distributed systems and their implications for system design;

  • understand the effects of large scale on the provision of fundamental services and the tradeoffs arising from scale;

  • be familiar with a range of distributed algorithms.

Recommended reading

* Bacon, J. & Harris, T. (2003). Operating systems: distributed and concurrent software design. Addison-Wesley.
Bacon, J. (1997). Concurrent Systems. Addison-Wesley.
Tanenbaum, A.S. & van Steen, M. (2002). Distributed systems. Prentice Hall.
Coulouris, G.F., Dollimore, J.B. & Kindberg, T. (2005, 2001). Distributed systems, concepts and design. Addison-Wesley (4th, 3rd eds.).