Concurrent Systems: Instructor’s Guide

Concurrent Systems: Instructor’s Guide
Contents

Introduction	Outline of the guide	2
Part I	Curriculum design	3
Part II	Points to emphasis and teaching hints	9
Part III	Solutions to exercises	39
Part IV	Some past exam questions	112
Part V	Transparency list and transparencies	115

Introduction

Thank you for your interest in Concurrent Systems. The objective of this guide is to provide additional information to assist teachers who are using the book. This material is presented in six parts:

Curriculum design. An outline of parts of the ACM/IEEE-CS "Computing Curricula 1991" is given for reference throughout this guide. I have suggested how Concurrent Systems might be used in the curriculum.
Points to emphasise and teaching hints. For each chapter I have identified important points and potential difficulties.
Solutions to exercises and a few additional exercises. The solutions include examples of how the various designs that are discussed have been used in practice.

A general point is that Concurrent Systems emphasises concepts that are traditionally taught in operating systems or database courses but are more widely applicable. Some of the detailed material that is of interest only to OS designers has been omitted from the main text. I have used some of these topics in the exercises.

Exam questions in the area that have been set at Cambridge in the past few years.
Overhead transparency masters.

I should like to learn from the experience of others who use Concurrent Systems in their teaching. In particular, it would be useful to share suggestions for practical assignments. The computing environments students use vary considerably and it is impossible for one person to suggest universally applicable project work.

I have set up an email distribution list:

Bacon-ConcurrentSystems-Book@cl.cam.ac.uk

so that teachers using the book can talk to each other. If you would like to be added to the list send your request to

Bacon-ConcurrentSystems-Book-request@cl.cam.ac.uk

I should also be grateful for suggestions for improvements of any kind:

Jean.Bacon@cl.cam.ac.uk

or mail can be sent to me at:

University of Cambridge Computer Laboratory

New Museums Site

Pembroke Street

CAMBRIDGE

CB2 3QG

Part I

CURRICULUM DESIGN

1.Aspects of the ACM/IEEE Curricula for Computer Systems Courses

This section gives a brief summary of the core curriculum in computing which is used for reference in this guide. It also suggests how the material presented in Concurrent Systems might be used in the curriculum.

The full document is:

TUCKER A. B. et al. (editor and co-chair)

Tucker A B, Barnes B H (co-chair), Aiken R M, Barker K, Bruce K B, Cain J T, Conry S E, Engel G L, Epstein R G, Lidtke D K, Mulder M C, Rogers J B, Spafford E H, Turner A J.

"Computing Curricula 1991 Report of the ACM/IEEE-CS Joint

Curriculum Task Force" ACM press, IEEE press 1991, ACM order number 201910, ACM ISBN number 0-8979-381-7,

IEEE Computer Society Press order number 2220, IEEE Computer Society Press ISBN number 0-8186-2220-2

It is summarised in:

TUCKER A. B. et al. (ed.), (1991)

"A summary of the ACM/IEEE-CS joint curriculum task force report

Computing Curricula 1991" Comm. ACM 34(6) June 1991

An earlier introduction to the philosophy behind the curriculum design is given in:

DENNING P. J. et al., (1989)

Denning P J, Comer D E, Gries D, Mulder M, Tucker A, Turner A J, Young P R.

"Computing as a Discipline" Comm. ACM 32(1) Jan 89

The reader is referred to the document for a full discussion but the following quotations indicate the importance of an underlying theme and related concepts:

"Recurring concepts are significant ideas, concerns, principles and processes that help to unify an academic discipline at a deep level" p12

"From the instructor’s perspective (and also from the student’s perspective), a course is rarely satisfying unless there is some "big idea" that seems to hold disparate elements together" p15

".... portray computing as a coherent discipline rather than a collection of unrelated topics." p15

2. An outline of the common core curriculum in computing

The common core curriculum comprises courses in the following subject areas:

AL Algorithms and Data Structures

AR Architecture

AI Artificial Intelligence and Robotics

DB Database and Information Retrieval

HU Human Computer Communication

NU Numerical and Symbolic computation

OS Operating Systems

PL Programming Languages

PR Introduction to a Programming Language (optional)

SE Software Methodology and Engineering

SP Social, Ethical and Professional Issues

Each area comprises a number of knowledge units. Those marked (*) have direct relevance to Concurrent Systems. Some contain prerequisite material, some related material and some are covered in the book.

AL Algorithms and Data Structures (approximately 47 lecture hours)

AL1* Basic data structures

AL2* Abstract data types

AL3 Recursive algorithms

AL4 Complexity analysis

AL5 Complexity classes

AL6* Sorting and searching

AL7 Computability and undecidability

AL8* Problem-solving strategies

AL9* Parallel and distributed algorithms

AR Architecture (approximately 59 lecture hours)

AR1 Digital logic

AR2 Digital systems

AR3* Machine level representation of data

AR4* Assembly level machine organisation

AR5* Memory system organisation and architecture

AR6* Interfacing and communication

AR7* Alternative architectures

AI Artificial Intelligence and Robotics (approximately 9 lecture hours)

AI1 History and applications of artificial intelligence

AI2 Problems, state spaces and search strategies

DB Database and information Retrieval (approximately 9 lecture hours)

DB1* Overview, models and applications of database systems

DB2 The relational data model

HU Human-Computer Communication (approximately 8 lecture hours)

HU1* User interfaces

HU2 Computer graphics

NU Numerical and symbolic computation (approximately 7 lecture hours)

NU1 Number representation, errors and portability

NU2 Iterative approximation methods

OS Operating Systems (approximately 31 lecture hours)

OS1* History, evolution and philosophy

OS2* Tasking and processes

OS3* Process co-ordination and synchronisation

OS4* Scheduling and dispatch

OS5* Physical and virtual memory organisation

OS6* Device management

OS7* File systems and naming

OS8* Security and protection

OS9* Communications and networking

OS10* Distributed and real-time systems

PL Programming Languages (approximately 46 lecture hours)

PL1 History and overview of programming languages

PL2 Virtual machines

PL3 Representation of data types

PL4 Sequence control

PL5 Data control, sharing and type checking

PL6* Run-time storage management

PL7 Finite state automata and regular expressions

PL8 Context free grammars and pushdown automata

PL9 Language translation systems

PL10 Programming language semantics

PL11* Programming paradigms

PL12* Distributed and parallel programming constructs

SE Software Methodology and Engineering (approximately 44 lecture hours)

SE1 Fundamental problem solving concepts

SE2 The software development process

SE3* Software requirements and specifications

SE4* Software design and implementation

SE5 Verification and validation

SP Social, Ethical and Professional Issues (approximately 11 lecture hours)

SP1* Historical and social context of computing

SP2* Responsibilities of the computing professional

SP3* Risks and liabilities

SP4 Intellectual property

3. The advanced and supplemental curriculum

The following subject areas are suggested by ACM/IEEE for advanced courses following from the core:

Advanced operating systems
Computer communications networks
Computer security
Database and information retrieval
Fault tolerant computing
Parallel and distributed computing
Real-time systems

4. Concurrent Systems in the curriculum

The way a subject area is presented must depend primarily on the interests of the staff involved.

~~FIGURE 1~~

Figure 1 shows the intended use of the material in Concurrent Systems in the Computer Science curriculum. We present concepts that are important to a number of systems areas (what everyone should know) then specialist courses in those areas (what only specialists need to know). Concurrent Systems gives the flavour of the specialist courses by building up to the important issues.

For example, a course in large scale distributed systems can assume knowledge of basic RPC and can focus on higher level distributed algorithms; courses on computer networks can assume that the students have covered network interfaces and I/O and are comfortable with the implementation of services and protocols as layered, multi-threaded software.

Operating Systems is a substantial part of the ACM core material. Some OS topics are best taught at the specialist level, perhaps in a performance analysis or modelling context. Examples are the layout of blocks on disks to optimise performance and the performance of paging algorithms. Everyone does not need such topics at an early stage. Because operating systems are close to the hardware they are likely to need to change in response to changes in technology. We should not embed approaches that are closely tied to current technology in a first course.

~~FIGURES 2 and 3~~

Figures 2 and 3 show how the material in Concurrent Systems is presented in traditional courses on OS. The material on deadlocks may be taken at any stage and is shown in Figure 2 in the context of resource allocation by an OS. Sometimes, concurrency is taken early as shown in Figure 3. Sometimes a separate, practically based course on concurrent programming is given, but this would often use only one concurrent programming language. Chapters 1, much of 2, 4, 8 and 12 are not usually emphasised in traditional courses but provide a coherent framework for the presentation.

Most OS texts are now expanding to include "Distributed Systems". They cover material I have presented in Chapter 1 (system topology), 3 (the ISO OSI model), 13 and 15 (message passing and RPC). They also tend to include distributed algorithms (Chapter 22).

A separate, advanced course on large-scale distributed systems could cover naming, location, placement, replication, protection, authentication and encryption. It is in such a course that the algorithms that I reach at the end of Chapter 21 (two-phase commit and distributed validation) and in the appendix (distributed mutual exclusion) can best be studied. Concurrent Systems builds up to such algorithms and emphasises the special characteristics of distributed systems. They are included to give the flavour of what is involved in an in-depth study of distributed systems.

Perhaps the most novel aspect of Concurrent Systems is the material in Chapters 18-21. This would not be covered in most current courses on OS, although I have attempted to show the relevance of the concepts to OS and concurrent programming. It is, however, beginning to be covered both in advanced courses on Distributed Systems (because we need transactions to handle partial failures) and in Distributed Database (where transactions are distributed). I feel this material should be extracted as a common core topic. As a long-term OS specialist I have, to some extent, "stolen" this material from colleagues specialising in database. A solution is to involve both OS and DB specialists in the teaching and by this means to incorporate experience of the DB application area. It is important not to lose experience and intuition of the needs of this and other application areas.

From the point of view of students it can be confusing to be taught "concurrency" in both OS and DB courses. It is important to ensure that the treatment is consistent.

Part II

POINTS TO EMPHASIZE AND TEACHING HINTS

Please note that references are made in this part to prerequisite material and relevant material in related areas. The notation used is taken from the ACM/IEEE-CS Curriculum definition given in Part I.

Chapter 1 Introduction: Examples and Requirements

Objectives

To define concurrent systems in an intuitive way. To give examples of different kinds of concurrent system. To discuss their essential properties and establish requirements for their design.

Points to emphasise

Point out the rate of development of technology in the past and projected for the next 40 years (the working life of the students). Computer scientists have to build software systems that exploit technology.

The requirements for implementing the various types of concurrent system have a great deal in common although the systems may vary considerably in applications and scale. The summary of requirements in Section 1.5 are referred to throughout the book.

Real-time doesn’t necessarily mean very fast.

Point out the differences between emerging multimedia real-time systems and classic process control systems. The two applications share the distinction between synchronous and asynchronous events. A lot of research is happening in this area and operating system design is moving this way.

Possible difficulties

There is a lot of material in Chapter 1. The aim is to draw on the experience and prior study of the audience, and suggest where we are heading, rather than teach these things for the first time.

Teaching hints

It may be useful to show Figures 1.11 and 1.12 early to show "where we are heading".

The introduction should give the feeling that the topic is up-to-date and exciting. A long description should be avoided and interaction, drawing on the students’ experience, should be used if possible. The instructor may have other examples that the students already know from other courses. Examples of disasters are often useful or of financial transactions. The students should feel that they want to know about this area and should know about it because it is central.

At this stage, the discussion of operating systems is in very general terms. It is best not to spend too long on them here but to give the introduction to OS concepts later, with Chapter 2.

The idea is not to teach Section 1.3 in detail as this is the subject matter of AR7. The material is summarised here to show its relevance. It might be covered briefly in lectures or be left for private reading.

This generation of students may be able to think parallel quite naturally. Don’t assume that older computer scientists’ serial view of how things are done is held by the students.

PART I BACKGROUND AND FUNDAMENTALS

Chapter 2 System Structure and Dynamic Execution

Objectives

To set up the concepts relating to modular software structure and its dynamic execution by processes. To show that OS support-functions are fundamental.

Points to emphasise

We use the concepts of modules, abstract data types and objects that students study in PL and SE as a basis for studying and reasoning about the structure of all software systems.

Important concepts are interface and implementation; information hiding and encapsulation of data. I have also touched on the "meaning" of any module.

We need something in addition to modular structure in order to comprehend the behaviour of a system in time. Hence the concepts of process and protocols for defining the sequences of actions of processes.

We focus on operating system (OS) structure not because the course is just about OS but because an OS is a concurrent system, it contains concurrent subsystems and it may support applications which are concurrent systems. Knowledge of what OS’s do is fundamental. Although a concurrent system, such as a process control system, may not run on a conventional, general purpose OS such as MS-DOS it needs certain basic support functions that operating systems provide. That is, we focus in Part I on the basic functions provided by any OS.

A modular approach allows us to configure or distribute a system (microkernel + required services) we need for any application.

Possible difficulties

The students may have a conceptual model of a general purpose operating system. Some may be pleased at being a UNIX or MS-DOS guru and may have taken the course because it’s something they feel good at already. They may even look down on an instructor who doesn’t know every detail of every aspect of the locally used OS! It’s important to set up a basis for reasoning about OS which is not based on detailed factual knowledge of one of them.

Teaching hints

Relate the discussion of modules to whatever the students have studied in PL and SE. There is no point in this being the first place they hear about advanced type systems. The concept of interface and implementation is what we shall build upon. This can be reinforced by many everyday examples of black boxes with knobs on the outside for invoking functions. A UK example is the BBC (Black Box Company).

Ask then to think up more analogies, similar to "a book and reading it" of a static specification and dynamic processing. Dream up some bizarre ones.

Justify the need for a protocol. You should invoke write to put data into a file before you invoke read. Your system may have the protocol open, read/write, close as the protocol for file access. Think of everyday analogies.

The discussion of microkernels may help to give an up-to-date feel to the course. Microkernel design is an active research area.

Chapter 3 The hardware interface, I/O and communications

Objectives

To study the mechanisms for interaction between a software system and the outside world. To set up real instances of the occurrence of events and the need for synchronisation.

Points to emphasise

Process-hardware synchronisation at the device interface level.

Process-process synchronisation between client requests (top down) and the device-handling level (driven bottom up by hardware events).

The need for buffers for DMA interfaces and between client and device driving software.

Similarities and differences between dedicated, shared and communications device interfaces (speed, throughput, unit of transfer) and the higher levels of software associated with handling them.

Possible difficulties

There is a lot of material here.

Communications handling belongs here because the device interfaces can easily be taught with those for dedicated devices and shared devices. The danger is that too much time may be spent on re-teaching hardware.

Communications software, again, could be taught in great detail and take up a great deal of time. The difficulty is to convey the essential concepts fairly briefly but without being superficial.

Teaching hints

Device interfacing and interrupt handling will have been taught elsewhere: AR6. It is probably worth revising polling and interrupt handling. The machines (and network) with which the students are familiar could be used rather than the M68000 and R2000 examples in the book. It is not essential to cover both CISC and RISC. It is useful to cover communications interfaces at the same time.

Facts and figures comparing, for example, disk and network interfaces could be given. How often does the software have to respond to hardware events? How much data can be dumped in your memory from across the network? What happens if you run out of buffer space? Convey the idea of concurrent hardware events to which the software must respond and more concurrent events caused by clients’ requests.

One should not attempt to cover in detail the implementation of the layers of communications software, for example the use of buffering between layers. The concepts of a service interface and a protocol between peers at a given level can be explained. Examples of applications protocols rather than layer protocols could be used. Examples such as a mail service or an FTP service could be used.

Chapter 4 Support for Processes

Sections 4.1 through 4.11 were Chapter 6 of edition 1.

Sections 4.12 through 4.16 were Chapter 7 of edition 1. These topics are treated separately below, as in the Instructor’s Guide for Edition 1.

Support for processes in operating systems (Sections 4.1 through 4.11)

Objectives

To show the concrete implementation of the process abstraction. To show how processes are scheduled to run on processors. To introduce the special requirements of multiprocessor and real-time systems.

Points to emphasise

This is a key chapter. Chapters 3, 4 and 5 have discussed OS functions.

Processes provide and invoke these functions - processes make things happen.

The idea of the process abstraction and its implementation. Focus on a process executing user-level code then consider the management mechanisms that make this possible.
Pre-emptive and non-pre-emptive scheduling. Note that interrupts are taken by a non-pre-emptively scheduled process but control returns to it whatever process might have been unblocked by the interrupt.
Although this chapter doesn’t say much about "threads" it is important to introduce the concept. Chapter 4 covered processes sharing part of an address space. Threads share all the resources of a "process" including the whole address space. Note there is no protection between threads.

Possible difficulties

The basic material on implementing and scheduling processes should not present any difficulties. There are tricky questions like "what executes the process management module" which one should be prepared for but are rarely asked.

One could spend a great deal of time on sophisticated scheduling algorithms with associated performance analysis, but I feel this is best done in the context of an advanced course on specific types of system. For example, real-time systems have special requirements which are only hinted at here. Future systems which support continuous media will handle scheduling quite differently from current systems. One shouldn’t be dogmatic about the way things are and must be.

Teaching hints

The design of a process implementation draws on previous material. A process has a memory allocation, needs to be informed of asynchronous events and errors, may need to synchronise with the hardware to perform I/O, may be blocked or runnable, may have open files and so on. The students could deduce all of these things for themselves.

Consider the interface of a process management module. Discuss the operations that one would wish to invoke on a single process: create, delete, start (put in scheduling queues), stop (remove from scheduling queues), block, unblock, set priority, dispatch, remove from a processor and so on. Discuss any operations that involve all processes, such as schedule.

A concrete example using the instructions of a machine well-known to the students could be used to show how a process’s state is set up in the hardware and how control is passed to it.

When threads are introduced, rework the process implementation sections for threads. Discuss what the OS does when switching between threads of the same process and between threads of different processes.

Chapter 4 continued

Support for processes in language systems (Sections 4.12 through 4.16) and integration of language system, operating system support (Section 4.17).

Objectives

To show how concurrency is supported at the language level. To show the possible relationships between language level processes and operating system processes.

Points to emphasise

Those aspects of the state of a process that are of interest to the OS, those that are of interest to the language-level support.

The differences between and similarities of co-routines and processes.

When it would be appropriate to use each.

Even though your language lets you write concurrent processes, they may not be able to run in parallel or respond to asynchronous events.
In order to understand how a language level concurrent program behaves, you must know the process model of the underlying OS (the OS may not be able to know about more than one process per program) and whether system calls are synchronous (potentially blocking) or asynchronous.
If the OS supports multi-threaded processes the processes you write in your program may be supported by the OS and scheduled independently. If scheduling is pre-emptive, a thread may be scheduled immediately an event occurs. You must be able to assign priorities to your threads to arrange for this to happen.

Possible difficulties

Some students may not have experience of a concurrent programming language or an operating system which supports dynamic process creation.

Students tend to find it difficult to see the differences between co-routines and processes.

Teaching hints

Useful prerequisite material is PL6.

A laboratory session showing a language with support for concurrency would be useful at this stage, before IPC and related problems are discussed. The idea of being able to specify independent threads of control and have them executed in parallel should be reinforced.

An exercise showing dynamic creation of UNIX processes could be used to reinforce the OS, separate-address-space-per-process model. For example, set up three commands with a pipe between each pair and discuss what is happening.

Emphasise what you can and can’t do with a concurrent language running on an OS with whatever is available locally.

Chapter 5 Fundamentals of Distributed Systems

Objectives

To introduce at an early stage how to reason about the nature of distributed systems. Each topic that is studied subsequently can then be generalised, to consider it in the context of a distributed system.

Points to emphasise

The architectural variety of distributed systems. The fact that most systems we use today are networked so the software is distributed.

An object model can be implemented in a distributed world, refer to section 2.7.

Layers of abstraction help one reason about systems design. Refer to Section 3.9.

The fundamental properties - Section 5.5.

The non-deterministic nature of distributed systems.

Possible difficulties

Taking distributed systems early broadens the scope of study. It is feasible to cover Sections 5.1 through 5.5 first time through and return to time and naming at a later stage, perhaps with Chapter 22. It is important to cover enough to make the progression to distributed implementations natural for filing, IPC, transactions etc.

Teaching hints

It’s easy to contemplate an object model where objects are distributed. One can then reason about what must be provided in order to support this abstraction - one invents the necessary architectural components such as name services for name to location binding, communications support, access control etc.

Take familiar systems such as email or the web and investigate the names that are used and how communication is supported.

The main thing is that students understand the fundamental differences between centralised and distributed systems.

Chapter 6 Memory Management

Objectives

To introduce the concept of the address space of a process. To show how processes can share memory, in particular, writeable data. To study the memory management function of OSs. To show how virtual to physical address translation and memory protection are supported by hardware.

Points to emphasise

The concept of the virtual address space of a process is fundamental.

Segmentation is concerned with a logical division of the virtual address space. Paging is concerned with physical memory management and is transparent to the OSs client software.

Segmentation supports relatively fine-grain sharing and protection.

The hardware knows about pages (fixed size blocks). Segments may be known to the hardware or be managed only by the OS.

The memory management function has to respond to both hardware events and requests from clients for memory management services.

There may be multiple copies of shared, writeable data in the caches of a multiprocessor.

Possible difficulties

If this topic is treated in depth here it can hold up the development of the main theme. My students have had an outline of basic OS functions before coming to concurrent systems (CS). I cover memory management in detail in a course which follows CS. At that stage the students are more able to understand issues, trade-offs and potential directions of development.

Teaching hints

AR7 covers space allocation, the memory hierarchy, virtual memory and cache memory. This knowledge can be revised briefly, used and extended for an OS context.

Any examples used in AR7, or an example of an MMU in a machine used by the students, can be used in place of the R2000 example in the book.

Explore the implications of having a large virtual address space (say 32 bits, 4Gbytes). Get the students to work out how large a process page table could be for some specific page size, say 1Kbytes. Ask for suggestions on how to rationalise this space requirement. Lead up to the idea of a page table for main memory and process page tables for backing store, held on backing store.

Explore the implications of having a large amount of physical memory.

How much longer will paging out be needed? What are the implications of a crash in which main memory is lost?

Work out how long memory to memory copying takes of data blocks of, say, 4K, 8K, 16K, 32K bytes in size (use the instructions and timings of a machine well-known to the students). Get the students to think about using memory management for moving data buffers (I/O and comms.) between the system and processes and between processes.

Chapter 7 File Management

Objectives

To study the file service function of an OS as the lowest level provider of persistent storage. To consider typical file service interfaces and implementations. To study how the functions of a file service might be distributed.

Points to emphasise

Naming. Pathnames and unique file identifiers. The file identifiers (sometimes called handles) given to users may or may not be the same as those used internally by the filing system.

Sharing. Access control may be fine grained and complex to implement or crude and simple. Why it is useful to have links in addition to access control.

Discuss the semantics of "write" and the trade-offs in its implementation. You can do a fast, unsafe write or a slow safe one.

The file system implementor has to be concerned with the consistency of data structures in main memory and those on disk.

There are environments for filing systems other than a university campus.

The I/O bottleneck. Directions in which file services are likely to evolve.

Possible difficulties

The student may find it difficult to understand general issues when the local OS is very familiar. It may be useful to emphasise different application environments and requirements on future systems to store different types of data, such as multimedia, to create an open minded attitude and an approach based on reasoning rather than local knowledge.

Teaching hints

Directories are a good example of (persistent) abstract data types. Who is the owner of a directory? Why can’t the owner read and write an arbitrary byte sequence from/to a directory? The owner is a client of the DS which is a client of the "byte-sequence file service" (FS).

The DS knows the internal structure of directories which it stores as files on the FS. The FS does not know the internal structure of any files stored on it. Ask the students for other system services or utilities that use the FS in this way and themselves have clients. (e.g. mail, database management systems, management utilities which write error logs, spreadsheet packages).

Ask the students to focus on what is in main memory and what is on disk.

Consider a crash in which the contents of main memory are lost. Suppose a write request requires that several blocks are written out to disk and a crash occurs part way through. Suppose a crash occurs between the writing out of metadata and data.

Compare the projected performance n years ahead of disks with that of processors. How will future filing systems cope? Consider large client caches in main memory and on disk. How would the shared file service be used?
Give the example that a one-minute video clip occupies 12.3 Mbytes and requires a sustained data transfer rate of 1.5 Mbits/s. Emphasise that the typical 4Kbyte block size and, for example, a table of pointers, with indirection after the first few pointers, is unsuitable for this kind of file.

PART II SINGLE CONCURRENT ACTIONS

Chapter 8 System Structure

Objectives

To show how processes can be used in systems. To motivate the need for various forms of interaction among processes. To introduce the potential problems when interactions take place. To define the concept of atomicity.

Points to emphasise

Emphasise, by means of concrete examples, the need for multi-threaded servers in system design. Note that threads share writeable data.

The need for processes in separate address spaces within a machine and on different machines.

The need for interaction between threads of a process (using shared data) and between processes which do not share data.

A number of different interaction patterns among processes might be required by software systems that are to make use of the IPC facility provided by an OS.

If access to shared data by concurrent processes is not controlled, errors can occur.

The concept of a single atomic, indivisible operation on an object. How do we achieve atomicity? (These last two points set up the detailed study in the next two chapters).

Possible difficulties

The limited experience of students might make these issues appear obscure rather than of central relevance to system design.

Teaching hints

Give examples from the students’ experience wherever possible. Does their computing environment contain shared file servers? How many users are typically logged on at any time? How many are typically using a file server at any time? Recall the shared system tables in the file service implementation in Chapter 7.

On process interactions: broadcasting a message that a system is about to go down could be used as an example. If a separate, acknowledged message was sent to everyone, the system might have crashed before the last user was notified.

Discuss the operations on objects that the hardware makes atomic, e.g. read an integer, write an integer. Take simple examples of data structures and discuss their implementation as objects with interface operations. Analyse the number of hardware-atomic operations comprising each interface operation.

It may be useful to spend a while at this stage looking at the difference between the multiprocessor execution of "book a seat" and the uniprocessor with pre-emption implementation Figure 8.7. It’s worth noting that the OS has no knowledge of the language level process state and the countdown of a process’s time is independent of what the process is doing. Note that reading or writing a given location in memory is atomic.

Chapter 9 Low Level Synchronisation Primitives: Implementation

Chapter 9 of edition 1 has been split into two chapters and the part of the appendix of edition 1 on mutual exclusion without hardware support is now Section 9.3.

Objectives

To show how an operation on a data object shared by concurrent processes can be made atomic.

Points to emphasise

We have seen many examples of shared data objects in systems implementation, such as I/O and communications buffers, memory management tables and file system tables. These are accessed by concurrent processes.

A system must be able to enforce mutually exclusive access to shared, writeable data objects. The hardware may provide support and this approach is the first to consider for efficiency and simplicity.

Forbidding interrupts to achieve mutual exclusion is a possible option for a single-CPU system only. This approach is simple and easy and should not be scorned if the timing requirements on the system allow it to be used and a multiprocessor implementation is not required.

A composite read-modify-write instruction can be used to achieve mutual exclusion on a multiprocessor (Figure 9.10). These are often available, both in CISC and RISC architectures.

The semaphore data type is a formalisation of the flag or Boolean we have been attempting to use to solve our concurrency problems.

Possible difficulties

The material is substantial and difficult. In my experience it can’t be understood unless an individual spends a lot of time in personal study working through the algorithms. The distinction between kernel level and language level implementation of semaphores may present difficulties.

Teaching hints

I have now covered the algorithms for achieving n-process mutual exclusion without hardware support within the chapter. Section 9.3 attempts to guide the reader slowly and carefully through two of these algorithms and makes them consider parallel execution (on a multiprocessor or uniprocessor with pre-emptive scheduling) by means of exercises. The original CACM papers are short, often only one page, and students might be interested to see the liberal use of goto’s in the 60’s and early 70’s.

Look at the instruction sets of local machines for composite instructions or other support for concurrency such as locking the memory bus for a composite read-modify-write. A posting to the newsgroup comp.arch showed ho* the branch delay slot of the MIPS R2/3000 can be used to create a composite instruction, in the absence of any other support for concurrency control on that machine.

Chapter 10 Low level Primitives: Use in Systems and Languages

The case study with exercises - management of a disk block cache has been brought from the appendix of edition 1 into Section 10.10. The POSIX threads package has been added as Section 10.8 as a widely used implementation of mutual exclusion and condition synchronisation.

Objectives

To give examples of the use of semaphores, then event counts and sequencers, to solve problems that require both mutual exclusion and condition synchronisation, sometimes on more than one condition.

Points to emphasise

When we have implemented semaphore operations atomically we can use them to solve higher level problems.

The difficulty of understanding concurrent programs and writing correct ones.

Eventcounts and sequencers are presented as an alternative to semaphores.

Some of the analysis in the original paper was superseded by Lamport’s paper (1978).

Possible difficulties

Presenting concurrent programs to large classes. Students might not appreciate the difficulties if solutions to the classic problems are only presented in lectures. Some additional private work is needed.

Teaching hints

The semaphore solution to the readers and writers problem illustrates the difficulties of counting how many synchronising signals to send when one doesn’t have a monitor-like structure to simplify matters. The high level language solutions of Chapter 11 are easier to understand and the semaphore solution illustrates is useful to look back on for purposes of comparison.

Consider in advance whether it possible for a compiler to provide more support than we have when using only semaphores.

Chapter 11 Language Primitives for Shared Memory

Objectives

To show how concurrent programming languages support process interactions based on shared memory.

Points to emphasise

The development of programming languages with modular structure was taking place at the same time as concurrent programming languages.

Monitors make operations on objects atomic in the presence of concurrency. Synchronisation over the state of the object is needed in addition to exclusive access to it. The semantics of the WAIT and SIGNAL operations on condition variables are different from the semantics of semaphore operations. They take place within procedures executed under exclusion.

Monitors encapsulate a single shared object. A different problem is to provide concurrency control for many objects of the same type.

Synchronisation at the granularity of operations is desirable. SIGNAL and WAIT on condition variables is low level and higher level support is desirable. This has been achieved to some extent by path expressions for passive objects but more generally by guarded commands for active objects.

Possible difficulties

To make the topic a reasoned development rather than a description of programming language syntax. The details are needed as a basis for comparison, but comparison must be at the semantic level. Students are confused by the fact that semaphore operations and condition variable operations are different. (You always block on a WAIT (c.v.) and SIGNAL (c.v.) has no effect if the queue is empty).

Teaching hints

Pose a series of questions:

How can a compiler make the concurrent programmer’s life easier than just providing semaphores?

A critical region associated with some shared data is an obvious first step. Is this enough? No - how do we synchronise over the state of the shared data?

Conditions on arbitrary expressions on shared variables were tried but are inefficient to implement. Allow the programmer to declare condition variables and use SIGNAL and WAIT operations on them. This is more efficient but still low level, as hard as semaphores to get correct.

How about synchronising at the level of operations? How could you implement that for a passive structure like a monitor? Path expressions were tried (Path Pascal). Still difficult to get right and you can’t build in fully general dynamic, runtime behaviour.

Why not make the "monitor" active and let an internal process decide which invocations to accept? Dijkstra’s guarded commands can be used as the basis for this (see Ada).

Suppose there are many objects of a given type. How do you allow processes to access different objects concurrently but only one to access a given object at one time?

Give the students practical experience if possible. It would be good if a number of different concurrency constructs could be tried. See Part V of this guide - suggestions for project work. SR and Pascal FC provide a suitable environment.

Chapter 12 IPC and System Structure

Objectives

To bring the reader back to a high level view of IPC within system structure after the detailed study of shared memory based interactions of the previous two chapters. To introduce IPC when there is no shared data. To contrast IPC with and without shared data and to show that the same problems can be solved in both contexts, given suitable concurrency facilities.

Points to emphasise

IPC both with and without shared memory is needed in systems.

The concurrency features we studied in Chapters 9 and 10 were hiding the low level mechanisms from the programmer and combining concurrency and modularity. An abstract operation on an object can be made atomic in the presence of concurrency.

The new problem to address is how to solve problems without shared memory. We can pass data around from process to process or we can set up processes to manage data. Note here that we have come to active objects again but from a different direction.

Some servers can be considered as object managers. A server typically runs in a separate address space and takes requests in the form of messages.

The duality pointed out in Section 12.8 requires the existence of a number of concurrency features such as dynamic process creation in the shared memory approach and the ability of the server to select which messages it is prepared to service in the no-shared-memory approach.

Possible difficulties

This material should not be too difficult. It is setting up a context for Chapters 13 and 15 instead of diving straight into the detail.

Teaching hints

An OS that runs in part of a process’s address space and is executed procedurally (such as UNIX) gives a good example of both shared memory IPC (within the OS) and no-shared-memory IPC between user level processes.

A file server is an example of an active manager of many objects of the same type. A server will invoke an operation on a specified file object on receiving a request from an authorised client. The server occupies a separate address space from the client. The request takes the form of a message.

There are many examples of the various forms of high level IPC given in Figure 12.7. Mentioning them could indicate their relevance to system design.

The duality of a monitor and a server could be covered in detail, as in (Lauer and Needham, 1978).

Chapter 13 IPC Without Shared Memory

Objectives

To study the ways in which IPC between processes which do not share memory is supported. The study is relevant to both single-computer and distributed systems - refer to the special characteristics of distributed systems covered in Chapter 5.

Points to emphasise

Files are mentioned only as a "first attempt". They are not suitable for a general IPC mechanism. Pipes may be thought of as files that are kept in main memory for reasons of efficiency and for which read and write accesses have been augmented with a synchronisation mechanism.

Message passing is the most general and flexible IPC mechanism for processes which do not share memory.

Recall (Section 8.6) the process interactions that may be needed.

The concepts of naming and binding are relevant to, and often motivate, the variations of message passing.

Synchronous message passing avoids the overhead of buffer management in the implementation. It is likely to push buffer handling into the application because heavily used servers cannot be expected to synchronise with clients to send replies to requests for work.

Note the overhead of message passing. Memory management can be used to reduce the overhead of copying.

Message passing is used in existing systems and languages.

Recall the duality discussed in Chapter 12.

Possible difficulties

There are a large number of variations on basic message passing. Some of this material could be left for private study once the basic approach is covered. It is important to have a thread of development.

Teaching hints

The approach taken is to consider the process interactions that might be required and to show the variations on message passing that are designed to meet these requirements. The students could be asked to go back to Section 8.6 and to discuss how the interactions set up there are met.

Naming and binding can be used as a focus. Do the processes involved in the interactions know each others’ names? How are sender(s) and receiver(s) named (directly or via an indirection)? How and when does the binding between sender(s) and receiver(s) take place?

Practical work would be useful to reinforce this chapter.

Chapter 14 Crash Resilience and Persistent Data

Objectives

To extend the definition of "atomic operation" to include an operation on data held in persistent store. To consider the possibility that a crash might occur at any time and to consider how this should be taken into account in system design. To outline approaches to implementing atomic operations on persistent data in the presence of concurrency and crashes.

Points to emphasise

At this stage we are focussing on a single computer (not a distributed system) and introduce operations on data in persistent store.

The contents of main memory may be lost on a crash at any time.

A persistent programming language (PPL) might be available to manage the persistent data. More likely, a DBMS will manage fine-grain persistent data by means of files which it uses via an OS. We assume a DBMS with concurrent users.

An end user may receive a result from the system (such as a DBMS) and go away. That result must have been stored persistently.

We are concerned with achieving atomicity in the presence of crashes.

The techniques already covered in Part 2 have solved the problem of atomicity (of a single operation invocation on data in main memory) in the presence of concurrency. We assume the DBMS or PPL will control concurrent access to shared persistent data.

An idempotent operation is repeatable. It is not possible to make all operations idempotent.
An atomic operation in the presence of crashes is implemented by means of a great deal of mechanism such as storing before and after object state in a log or maintaining an old version of an object intact while the new version is built up.
The concept of atomic operation is widely applicable in systems and applications, in particular, in distributed systems as we shall see later.
An abstract operation should be an atomic operation, see Gleeson (1989).

Possible difficulties

To restrict the discussion to one operation invocation on one object to introduce the concepts as simply as possible.

Teaching hints

Start from an operation on an object in the main memory of a single-computer system. The result of the operation is lost on a crash. You must store the result in persistent memory before you tell it to the user. Consider application areas where this is important.

The logging and shadowing approaches to achieving atomicity are introduced here. It is a good idea to set exercises at this stage, in this simple context, to have the concepts in place before the complexity of composite operations is introduced in Part 3.

Consider a workstation of the future with a large main memory, say 400Mbytes. How might this memory be used? Should there be a substantial amount of non-volatile main memory as well or should we rely on remote file servers, accessed across fast networks, for persistent storage?

Chapter 15 Distributed IPC

Objectives

To study how to achieve a distributed implementation of IPC.

Points to emphasise

The subject matter so far in Part 2 is relevant to both centralised and distributed systems. We are now focussing on what is needed, in addition, to achieve a distributed IPC implementation.

Refer to Section 5.5. The absence of global time is not a big issue in this chapter. Nor is inconsistent or incomplete state. The independent failure modes of components of a system is the main concern.

The client-server model can be seen as a simple case of a more general object model. Servers manage objects on behalf of clients and invoke them on request.

Any IPC mechanism can be distributed. We show how to distribute message passing and procedure call.

An RPC system provides session and presentation services and can be integrated with a language in which applications are written. That is, starting from language-level IPC we are, in addition, supporting data representation and transfer and cross-machine synchronisation. The RPC protocol allows for congestion and failure.

ISO have to consider communication among heterogeneous systems. Many systems are homogeneous and can be implemented more efficiently than ISO presentation, for example.

Possible difficulties

The students may lack experience and understanding of communications and the communications services which distributed IPC must use. For the first time we are devising a protocol (RPC) which is assumed to sit above a transport service, such as UDP, and below an application.

When we move to a distributed rather than centralised system the issue of system-wide naming (also location and binding) must be considered. I believe a full treatment of naming etc. belongs in an advanced course which covers large-scale systems.

Teaching hints

It is good to look at independent failure modes first in the context of a simple RPC system (one client, one server). The protocols of Chapter 21 (e.g. two and three phase commit) and the Chapter 22 (distributed mutual exclusion) rely on this basic knowledge.

If the students have already studied communications in another course make use of this.

Some curricula may cover the material of Chapters 15 and 22 in a later course. In this case it would be useful to end Part II with an outline of how IPC can be distributed in order to form a link to that course.

PART III CONCURRENT COMPOSITE ACTIONS

Chapter 16 Decomposable Abstract Operations
Objectives

To extend the study of atomic operations to those which comprise lower level atomic operations. To study the problems deriving from concurrent execution and crashes.

Points to emphasise

We assume that concurrency control at the level of a single operation (to achieve atomic execution) is enforced by some means as described in Part II.

An operation which is meaningful at some level may comprise lower level operations that already exist as atomic operations in their own right.

It is desirable, to achieve high performance, to allow concurrent (interleaved) execution of the components of high level operations.

Uncontrolled concurrent execution can lead to incorrect system state and incorrect output to the client.

An attempt, based on locking, to control the interleaving and avoid these problems can lead to deadlock.

A crash part way through a composite operation can leave persistent state inconsistent. Some sub-operations have completed, some haven’t.

We shall use the object model introduced here as a basis for reasoning throughout Part III. It is set up in more detail in Chapter 18.

Possible difficulties

The conceptual difficulty lies in considering levels of atomicity. In Part II we constructed atomic operations from a number of machine instructions. We are now constructing higher level atomic operations from atomic operations. Why are the problems different? Because we do not execute (implement) the high level composite operation as a single atomic operation but allow the execution of its components to be interleaved.

The material should be straightforward at a superficial level as it is just setting up problems that we have to solve.

Teaching hints

Ask the students why we shouldn’t implement the execution of every meaningful operation, however high level, as an atomic operation. Get them to convince themselves that interleaved execution is a good thing to aim for. Draw on familiar examples from DB. Speculate on the performance of an OS that single-threads through every composite operation.

Discuss concurrency control (done in Part II) and crash recovery for Chapter 14’s single atomic operation.

Discuss the likelihood of conflict in practical examples. Prepare the ground for a later discussion of pessimistic and optimistic approaches to concurrency control.

Deadlock is traditionally introduced with the Part II material. It may be of interest to say so and to give examples from OS and concurrent programming. Or wait until the next chapter.

Cover the object model in the example in Section 16.5 to prepare the ground for later chapters.

Chapter 17 Resource Allocation and Deadlock

Objectives

To establish the requirement for the dynamic allocation of objects and to study the possibility that deadlock can result from this. To study how deadlock can be detected, prevented and avoided.

Points to emphasise

If you have complete, static information of the objects required by processes and the order in which they are requested you can plan a schedule in advance and can avoid the possibility of deadlock.

There are allocation policies that lead to the possibility of deadlock: exclusive access (in the sense that a request can be refused); hold while waiting; and no pre-emption. If these policies must be used for the dynamic allocation of certain objects then it is possible that the order of requests for these objects by processes will be such that deadlock occurs.

One approach to the management of resource allocation is to grant every request that you can, delay any request that you can’t grant and detect deadlock. If you have no information about the resource requirements of processes in advance of their requests this is all you can do.

If you have more information, such as the total resource requirements of each process, you can devise algorithms to prevent deadlock.

Possible difficulties

The deadlock detection algorithm often seems counter-intuitive. Emphasise that we are looking for a cycle that exists NOW. Any resources that are not involved in such a cycle can be put back in the pool (at the moment). If we run the algorithm again later, after more requests have been made, these resources may have become part of a cycle.

Teaching hints

Give examples of deadlock, livelock and starvation from many areas: concurrent programming, communications protocols, OS resource allocation, DB etc.

Give practical examples to reinforce understanding of the algorithms.

These can again be taken from different application areas.

Exercise 16.5 can be used as a small project on the dining philosophers problem. Pages 80 - 84 of this guide contrast solutions which are specific to that problem (of symmetric concurrent processes) with the general approaches to deadlock detection and prevention given in this chapter.
In discussing how a system might recover from deadlock, point out that we shall be looking at this for transaction processing systems throughout Part III.
The appendix covers two centralised mutual exclusion algorithms in detail and three approaches to distributed mutual exclusion algorithms. A discussion of deadlock is included in the text and the exercises. This aspect of the appendix could be used here.

Chapter 18 Transactions

Objectives

To establish the ACID properties of transactions. To establish a conceptual framework (an object model and serialisability theory) as a basis for studying how these properties can be maintained under concurrent execution and crashes.

Points to emphasise

Consistency (C) and Isolation (I) are the concerns of concurrency control (Ch 19). Atomicity (A) and Durability (D) are the concerns of crash resilience (Ch 20). We shall focus on C and I until we reach Chapter 20.

C and I are maintained by a serial execution of transactions; that is, by implementing each transaction as a single atomic operation. We have argued (Ch 16) that we need more concurrency than this.

C is maintained by a serialisable execution of transactions. Section

18.6.1 defines conflicting operations in the context of an object model and

18.6.2 gives a definition of serialisability in terms of conflicting operations.

An execution schedule for the operations of a number of transactions can be represented as a directed graph. This is called a history. We require a transaction history to be serialisable.
A serialisation graph is derived from a history and shows only transactions rather than the operations of transactions. We prove (in 17.8) that a transaction history is serialisable if and only if its serialisation graph is acyclic.
The behaviour of the system must be as though the property of Isolation was enforced in the implementation of transactions. If we optimistically relax this property to achieve higher concurrency we have to be able to recover when things go wrong.
We are not attempting to cover nested transactions explicitly.

Possible difficulties

The material in this chapter is fundamental and conceptually difficult. Chapters 19, 20 and 21 rely on these concepts.

Teaching hints

Recall concurrency control (done in Part II) for Chapter 14’s single atomic operation. We now have composite operations with each component guaranteed atomic with respect to concurrent execution. What are the new problems?

Reinforce the idea of conflicting operations. Ask the students for examples.

Give as much time as possible to this material. The students should spend private study time on it. Make sure they know the definition of serialisability (18.6.2) and that a transaction history is serialisable if its serialisation graph is acyclic (18.8).

Isolation may or may not be enforced in the implementation. Define a strict execution. What are the trade-offs? Discuss optimistic-Vs-pessimistic assumptions. Relate the discussion to different application areas.

Chapter 19 Concurrency Control

Objectives

To study how the (C and I) properties of transactions may be ensured under concurrent execution.

Points to emphasise

The concepts established in Ch 18 are not just applicable to TP systems.

Concurrent execution of composite operations can take place in concurrent programs in main memory in centralised and distributed systems. The C and I properties must be ensured by some means.

2PL is a pessimistic method. It achieves Consistency through guaranteeing a serialisable schedule of the operations of concurrent transactions. It has the overhead of deadlock management. Objects are locked for longer than is necessary for their invocation. We can enforce Isolation in the implementation through strict 2PL.
TSO is a pessimistic method. It achieves Consistency by enforcing one specific serial order on the transactions. It is deadlock free but correct transactions can be rejected. Isolation can be enforced by extending to strict TSO. Decisions are taken at each object rather than centrally.
In OCC we assume that contention is unlikely. We could only achieve a guaranteed consistent set of shadows in OCC by a pessimistic approach. We therefore optimistically choose to invoke operations on shadow objects which may turn out to be inconsistent.
When a transaction requests commit in OCC a validation phase ensures that the shadows used by the transaction were consistent and that no conflicting updates have been made to the persistent object states since the shadows were taken. The updates are then guaranteed to be made at each persistent object in the order in which the transactions committed.
OCC achieves high object availability by relaxing the guarantee of consistency.

Possible difficulties

2PL and TSO with strict execution are straightforward. Without strictness, cascading aborts may result and state recovery procedures may be needed (see Chapter 18). OCC is more difficult to understand because it restricts concurrency as little as possible. It is implicitly non-strict because the set of shadows are not guaranteed to represent a consistent state. Invocations take place in isolation once the shadows are taken and when commit is requested all are considered together. Serialisability is achieved when the updates are applied.

Teaching hints

The examples given in the chapter are simple and intended to reinforce the concepts.

Discuss different types of system and their requirements. When must we have high object availability so that timing constraints can be met? When is contention unlikely? What are "hot spots"?

Figure 21.4 shows validation and update for OCC in a distributed system.

The solutions in edition 1 to the additional exercises for Chapter 18 contained some figures which are included for extra explanation in Chapter 19 of edition 2, Figures 19.8 - 19.11.

The OCC example in Section 19.6 (Figure 19.13) shows that our definition of conflict, based on serialisability, may be over-restrictive for this method. Although these issues are interesting they are taking us to current research.

Chapter 20 Recovery

Objectives

To study how the (A and D) properties of transactions may be ensured under concurrent execution and crashes.

Points to emphasise

C and I are the concern of concurrency control. Recovery procedures must ensure A and D.

We are not concerned here with creating a stable storage abstraction. We assume the system can make the probability of losing data in persistent store on a crash arbitrarily small.

As in Chapter 14 we assume a fail-stop model for a crash.

The effects of any committed transaction must persist.

The transaction is the unit of recovery. It is the unit which is meaningful to the application.

Theoretical models are based on the idea of an invocation history, starting from the creation of an object. Practical implementations are based on recording an object’s state. Section 20.2 justifies this.

A recovery log record must be written before an update in place is made.

Possible difficulties

The object model: holding a complete invocation history of an object allows ANY state to be recovered. In practice we have to record a limited amount of state.

The (write-ahead) log-based approach to recovery should be straightforward. Any difficulty may be associated with the switch from reasoning at an abstract level to practical implementation issues.

We are assuming that every operation can be undone and redone and that UNDO and REDO are idempotent in case of crashes during recovery. This is OK for the simple examples here but not so obvious in general. It is safe to assume that before and after state are available. Only prior state need be recorded in the (write ahead) log before the operation is performed.

Teaching hints

To follow through the recovery log approach focus on what is in volatile main memory and what is in persistent store. Recall from Chapter 7 that the unit of transfer to disk may be large. It is not realistic to write every update to disk synchronously.

Focus on the significance of commit; the transaction is the unit of recovery. You must be able to perform the invocations made by a committed transaction, even if this is from information in the log.

Chapter 21 Distributed Transactions

Objectives

To extend the study of transaction processing systems to allow for a distributed implementation. To study concurrency control and commitment in a distributed context.

Points to emphasise

We reiterate the special properties of distributed systems from Chapter 14. We need to consider independent failure modes and the absence of global time. We are not concerned with object replicas nor with inconsistency resulting from communication delay.

The methods of concurrency control in which decisions on operation invocations are taken locally at each object (TSO and OCC) distribute naturally. A single manager is assumed to co-ordinate object invocations and commitment.

At first sight it seems that we need a distributed deadlock detection algorithm to implement distributed 2PL. In practice a timeout mechanism and abort could be used instead.

All the objects involved in a transaction must make the same decision with respect to commitment or abortion. This is more difficult to achieve in a distributed system where components may fail independently of each other at any time. The two-phase commit (2PC) protocol attempts to achieve this. Note that there is a single point of decision in the protocol. More complex protocols e.g. 3PC specify detailed recovery procedures which are discussed informally here.

Atomic commitment is pessimistic and is contrary to the philosophy of OCC. A two-phase validation protocol is outlined for OCC during which shadow objects may be taken for the execution phase of other transactions.

Possible difficulties

As in Chapter 19, OCC is the most difficult to understand because it is inherently non-strict. The discussion should grow out of Chapters 15, 18, 19 in a natural way.

Teaching hints

The approaches to distributing the various methods of concurrency control are outlined in the chapter. In each case, the students could be asked to fill in the detailed actions of the manager and participants. For example, ask the students to specify a transaction manager which uses distributed 2PL based on timeouts instead of an algorithm for distributed deadlock detection.

Recall previous discussions on pessimistic and optimistic approaches to concurrency control. An atomic implementation of commitment is associated with pessimistic methods.

Ask how atomic commitment is achieved in a centralised implementation.

What are the new problems in a distributed implementation. How can a single point of decision be enforced in the presence of failures? What must a node record in persistent memory in case of failure? How is this used in the recovery procedures after failure?

Recall the validation and update procedures for OCC. What additional problems are introduced in a distributed implementation?

Chapter 22 Distributed computations

Objectives

To introduce and criticise distributed algorithms and protocols. To constantly bear in mind the fundamental characteristics of Distributed systems introduced in Section 5.5.

Points to emphasise

The basic algorithms are easy to describe. The difficulty comes from considering that any component can fail at any time and we can’t tell whether failure of a message to arrive is due to failure of the network or the sender or congestion.

If a distributed algorithm replaces a single bottleneck with n bottlenecks it’s useless.

The basic assumptions of the algorithms may be that the number and identity of participants is known to all of them. One should then consider how this can be varied in response to failure, network partitions, members leaving and joining.

Possible difficulties

The implementation of causal ordering of message delivery, Section 22.4, is difficult. A thorough treatment needs a mathematical basis. The solution presented is based on the assumption that we have a stable process group and everyone sees all the messages.

Teaching hints

Some algorithms e.g. quorum assembly, depend on the number of participants.

Starting with process groups helps to establish this.

Note the layering in the Section 22.4 on ordering the delivery of messages. messages arrive at a node but may be queued at a low level and delivered later to the application after other messages.
Distributed mutual exclusion should start with a discussion of applications where processes each need a copy of the code and where it may reasonable be distributed - i.e. object managers may handle the critical sections.

PART IV CASE STUDIES

Chapter 23 UNIX

Objectives

To understand the structure and behaviour of the UNIX kernel. To criticise the basic design and to study how more recent UNIX systems have met these criticisms.

Points to emphasise

UNIX is about thirty years old.

There are good and bad aspects of the basic UNIX design, see Section 21.15.

IPC was probably the worst aspect of the original design. System V has rewritten IPC. It offers shared memory with semaphores and local message passing. BSD developed networking which subsumes pipes.

UNIX is executed procedurally (in-process).

The UNIX process model is heavyweight (one process per address space).

Threads packages implemented above UNIX give only language-level threads.

UNIX sees only one process per address space.

System calls are synchronous and potentially blocking.
It is difficult to build systems above UNIX. We can’t have the multithreading we need. This is bad for both system services (e.g. communications packages such as RPC, see Figure 15.17) and higher level applications.
The UNIX interface has become a de-facto standard. More recent systems are constrained to support the UNIX interface so that existing software will still run on them.

Possible difficulties

The material should be easy to understand since we have often used UNIX as an example.

Procedural execution of the UNIX kernel can make some of the detail difficult to understand. How does bottom-up I/O handling get done? What happens when a process goes to sleep?

Teaching hints

Avoid a flat description. Take a critical approach. UNIX is so successful it can take it! Why is it so successful? Are other commercial systems even worse?

If the students can use UNIX systems set up practical work to illustrate command composition, dynamic process creation, pipelines of processes and so on. If System V or BSD are available, explore the new IPC facilities they provide. Compare their use for same-machine IPC with the original IPC.

Appendix B sets up an example of how a buffer cache is managed. The exercises could be answered in a UNIX context. Bach (1986) and Leffler et al. (1989) give solutions for System V and BSD respectively.

Chapter 24 Microkernels: Mach and CHORUS

Objectives

To give examples of the microkernel approach to OS design. To show how processes and IPC are provided in these systems.

Points to emphasise

Recall from Section 2.5 the motivations for a microkernel approach to OS structure. Recall the possible drawback that a given service will perform less well when implemented at user level than in the kernel.

Microkernels provide basic mechanisms via a set of abstractions. They are concerned with processes, threads, IPC (including networking), and memory.

Recall Ch 12 on approaches to message passing and Ch 14 on distributed IPC.

Microkernels designed for the commercial marketplace must emulate existing OS so that existing software may be run on them.

Note that the examples we consider, Mach and Chorus, have evolved to a similar design. They support multi-threaded processes; they use memory mapping to make IPC more efficient; IPC is based on message passing and the port abstraction is used as the destination for messages.

A current direction is to support applications with "real time" requirements.

Possible difficulties

The experience of the students may be such that they see no need for this change in approach to OS design (in spite of what they have learned "in theory").

Teaching hints

If the students have access to both a microkernel-based system and a conventional system such as UNIX, set up practical work to perform a comparison. For example, use a concurrent program where processes can become OS threads. Allow the threads to make system calls. Show that other threads can run.

Reiterate what the local closed system cannot do.

Focus on current and projected network speeds and speculate on the applications it becomes possible to support with such networks.

Again, in this chapter, we are leading on to an advanced course in distributed system design. Don’t try to cover too much here without a grounding in large scale system issues.

Chapter 25 Windows NT

New in edition 2

Objectives

To present the design of a widely used PC operating system.

To present a case study of object structuring in an OS design.

To contrast the provision of processes and threads with an old OS (UNIX).

Possible difficulties

Windows NT will evolve. Some students will know some versions intimately. Some local versions in use will have diverged from the original design presented here.

Teaching hints

Note the choice of the privilege boundary, excluding the window manager, and the use of processes and threads.

Here’s a concrete example of a process descriptor and a thread descriptor.

If the students use UML, or some other O-O method, for software design, some aspects of NT could be specified in this way.

Chapter 26 Middleware: CORBA and Java

New in edition 2

Objectives

To present and contrast two current approaches to supporting distributed software.

To show two implementations of an object model for distributed software and two examples of a distributed software architecture.

To consider interpretation as a way of supporting a single language on heterogeneous hardware and OSs, at some performance cost.

To show how heterogeneity is supported by CORBA IDL and different language bindings.

Possible difficulties

CORBA is large and is described in many thick standards in very general terms. If the students don’t get hands-on experience they find it difficult to see what’s really going on. Referring back to Chapter 15 on RPC should help with stubs and skeletons.

This area is evolving rapidly. Java was new at the time of writing and was used mostly through web browsers. Java RMI and JavaBeans were immature. CORBA was more established but closely bound to C and C++. Advanced features such as DII were not widely implemented.

Teaching hints

Practical work on Java using JVM and web browsers or using Java RMI is easy to set up. Students may already have used it for concurrent but not distributed programming. It’s suitable for individual project work too, e.g. a distributed diary and room booking system using RMI could be a final year undergraduate or Master’s project.

Discuss whether Java is a closed-world, single language approach to distributed software development. CORBA is predicated on the requirement for general interoperability.

List major companies that are committed to CORBA to show how serious a contender it is e.g. banks, telcos.....

Provide a practical CORBA environment if possible. Again, final year undergraduate or Master’s projects could use it.

Consider how both Java and CORBA are looking to support persistence and transactions.

Chapter 27 Transaction Processing Monitors and Systems

Objectives

To show the design issues in implementing distributed TP systems.

To consider TP case studies in financial applications.

Points to emphasise

The relevance of the issues we have discussed throughout the book to this application area. In particular, the use of: address space, process, thread and IPC mechanisms such as message passing and RPC.

This particular application area has a strong requirement for security.

We have not set up the basis for an in-depth study of security policies and mechanisms. This topic is left for further study on the whole. Aspects that are needed for the development of the material are presented in-line.

Notice the use of batches of transactions.
Notice that there is not a requirement for universal consistency of data replicas at an arbitrarily fine time grain.

Possible difficulties

The first section may look like "more of the same". The change of emphasis is that we are taking a concurrent application and looking at how OS facilities can be exploited.

Lack of knowledge of security issues may prove to be a difficulty.

Personal experience of these systems should offset this.

Teaching hints

Emphasise "what the TP system implementor needs to know about the underlying OS". If you are to buy a TP system to run above your OS you need to know whether they are compatible. It may be that you are intended to buy the hardware and OS as well.

Recall that we discussed the design and checking of capabilities in Sections 7.6.6 and 15.9.1. The use of a one-way function was outlined there. We take a similar approach to PIN issuing and validation.

Draw on the students’ experience of ATMs. What services do they offer?

Can you change your PIN on-line? Have cases of fraud been in the news?

How was the fraud perpetrated? Is it bank policy to deny internal fraud?

Appendix: Evolution of computer systems

This is included for reference throughout the book and is intended to provide a historical perspective on the evolution of technology and the system designs that evolved in response.

Part III

SOLUTIONS TO EXERCISES

with examples and additional exercises

Chapter 1 Introduction: Examples and Requirements

Examples

1-1 What is a real-time system?

A system in which the timing constraints are dictated by the environment.

Why is it useful to distinguish "hard" from other real time systems?

In hard RTS the timing requirements MUST be met. Failure to meet such a timing requirement means that special actions, such as "catastrophe" procedures, are invoked. Other systems are more likely to carry on with degraded performance.

In what ways do operating systems which support real time systems differ from those which support single user workstations and multi-user, timesharing systems?

In general, in the area of process scheduling, see Chapter 6.

Soft real time systems: may be similar, many new application areas are under research such as multimedia in general purpose systems.

Hard real-time systems: it is often possible to devise a schedule for the whole system statically. Processes typically need processor time periodically.

1-2 Classify the following as hard real-time, soft real-time or non-real-time systems:

An embedded computer system which controls the fuel mixture in a car engine. HRTS

A robot controller for a car production assembly line. HRTS

An on-line library catalogue system. NO

A single user workstation which supports graphics, digitised voice and video. SRTS

A world-wide electronic mail (email) system. NO

A network of computers for a banking system. SRTS

A hospital patient monitoring and control system. HRTS

An on-line medical record system. NO

A weather forecasting system based on a model of the earth’s atmosphere. RTS at a coarse time grain.

An international stock exchange computer system. NO

Do any of the above systems have to handle concurrent activities? All do.

1-3 Give examples of monitoring and control activities in real time and non real-time systems.

This is designed to draw on the students’ experience. A few examples are:

from real-time monitoring and control:

Military "command and control" systems have to monitor to detect, for example, incoming missiles, and to control anti-missile missiles in response.

Hospital patient monitoring systems must detect danger levels in heart function etc and take appropriate action.

Procedures to detect the build up of congestion, for example in computer networks, and to invoke control procedures to avert disaster, for example to divert traffic to a different route to avoid the congestion.

1-4 Define three general models for concurrent algorithms.

Parallel execution of the same code, in components of a distributed system for example, operating on partitioned data. An example is bank accounts partitioned by branch.

Pipeline: partitioned code, data flows past the code phases.

Divide and conquer.

On what kinds of hardware could you run a concurrent algorithm and expect to achieve better performance than on a uniprocessor? Are any new overheads introduced for the hardware you mention?

Basically, you need to exploit more than one CPU. Shared memory multiprocessors and distributed systems are the obvious examples. The new overheads are associated with the co-ordination that has to be done in a non-sequential algorithm.

1-5 What is a concurrent system? See Section 1.4.

Can a concurrent system run on a uniprocessor computer? Give examples to justify your answer.

YES. Many small scale multi-user timesharing operating systems, such as UNIX systems. Single-user systems may be able to carry out many activities in parallel. A simple word processor with interactive edit is a concurrent system, for example.

Components of process control systems.

1-6 What is a shared memory multiprocessor? Are any of the approaches for devising concurrent algorithms particularly suited to this kind of architecture?

A number of processors see a single shared memory.

All the processors can access all the data in memory. The processors can execute the same copy of the code at the same time. It is most suited to algorithms which require fine-grained co-operation, rather than independent working on partitions of large volumes of data.

1-7 What is the difference between communications handling for a multicomputer system with hardware-controlled connections between specific computers and a network-based multicomputer system?

In the former we have hardware-controlled routing of data from a source to a destination, determined by the system topology, and error checking and recovery similar to that in an internal bus. In the latter, communication (from anywhere in the network) is handled by (a large amount of) software.

PART I BACKGROUND AND FUNDAMENTALS

Chapter 2 System Structure and Dynamic Execution

Exercises

2-1 What are modules, abstract data types and objects?

Refer not only to Section 2.1 but to related work in courses on programming language design and software engineering.

2-2 What is a process?

A program under execution by a single thread of control. (at this stage multi-threaded processes have not been introduced).

2-3 Describe how a program, comprising a number of object code files, is loaded and run by an operating system. Which operating system functions are invoked in order to create the required environment for the program to run?

You give a command to a command interpreter (or indicate the same requirement to a GUI). The object code files and libraries must be located (file management) and linked into one or more load modules. Sufficient memory must be acquired (memory management) to load the program. Some or all of the program (see later) is loaded into main memory and a process is created (process management) to execute the program.

2-4 What are the main functions of operating systems in general?

Device, network, file, memory and process management.

What functions would you expect to be present in operating systems for:

A process control computer with a sensor for monitoring, an actuator for control and a network connection for reporting to and receiving commands from a control centre?

Device (sensor and actuator), network, memory and process management.

A dedicated, network-based filing machine or "file server"?

Device (disk), network, memory and process management.

A computer dedicated to controlling the communications passing between two networks; that is, a "gateway" computer?

Network, memory and process management.

An autonomous personal computer?

Device (keyboard, screen, printer), network (?), file, memory and process management.

A single-user workstation with services available across a network?

Device (keyboard, screen), network, memory and process management.

A machine dedicated to managing and answering queries on a database?

Device (disk), network, memory and process management.

2-5 What is meant by a closed operating system structure?

All the OS functions are provided through a single system call interface. No function can be rewritten to suit a special application area. Each function is designed for every instance of the OS. Software Engineering issues are associated with maintenance of a large module which must run on a variety of hardware.

What is a microkernel?

The minimal mechanisms that are required by all instances of an OS.

Other traditional OS functions may run at user level above the microkernel.

What are the advantages and disadvantages of closed operating systems and microkernel-based systems?

See the list at the end of Section 2.5.

2-6 Relate the definitions of modules and processes to the structure of operating systems. How might modules be used? How might processes be used?

In a strict hierarchy of modules a process executing in a module at a given level may invoke only lower level modules. Is it possible to arrange the operating system functions we have encountered so far into a strict hierarchy? What are the advantages and disadvantages of a layered structure?

Section 10.2 describes the strictly layered structure of the THE operating system. Exercise 10.7 describes the layers of the Venus operating system and the solution gives another reference on layering. The advantage of strict layering is that the system may be proved correct by starting from inner layers and progressing outwards. The disadvantage is that layering imposes an arbitrary restriction on the functions available to any layer. A given layer knows about layers below it but can only respond to requests from layers above it.

2-7 List the resource types managed by your local operating system e.g. files, directories, memory, devices, processes etc. Consider how they might be implemented as objects. Give the interface operations for each object type and outline the interactions involved in using each object.

2-8 Why is a global naming scheme required for objects within an object based system? Discuss the pros and cons of naming schemes based on hierarchical names and global identifiers (e.g. of 64 bits). How can such identifiers be constructed so as to be unique?

Distinguish between access control and concurrency control when a process requests to open an object for use. Distinguish between object handles and object names. Discuss the circumstances under which an object can be closed but retained and deleted.

Chapter 3 The Hardware Interface, I/O and communications

Exercises

3-1 By looking in Hennessy and Patterson (1990), for example, find out the following:

The execution times of typical instructions of CISC computers such as the VAX series, the IBM System 370, the Motorola 68000 series, the Intel 8086, and 80x86 series.

The execution times of typical instructions of RISC computers such as the Motorola 88000, the SPARC, the MIPS R3000 and the Intel 860.

In both cases note the instruction lengths and their functionality.

Now find out the rate at which networks and peripherals, such as terminals, printers, disks and RAM used as disks can accept or deliver data and also the unit of data that is accepted or delivered.

I leave the details for local research. The instruction execution times will be given as "number of clock cycles" and can be compared on this basis. A given architecture may be fabricated with different technology. Note that a number of RISC instructions are often needed to perform the same task as some CISC instructions.

3-2 What are the advantages and disadvantages of handling devices by a polling scheme compared with an interrupt driven approach? In what kinds of system does the application dictate which approach must be taken?

Polling is simple but the timing is not controllable at a fine grain. You look at a device when you get around to it and not as soon as it needs attention.

Interrupt driven software is more difficult to program but you have the information that a device needs attention and can respond.

In real-time systems you need to know immediately events such as alarms occur. It may be that your schedule allows you to poll for periodic events.

Multi-user systems where terminal lines may lie idle for long periods. That is, the devices that are active vary dynamically. Also, it is wasteful to poll while a user at an active terminal sits and thinks. A reasonable response to all users is a requirement.

3-3 How many memory accesses are made during the hardware exception handling mechanisms described in Sections 3.2 and 3.5? Estimate the total time to achieve the transfer of control from the interrupted program to the exception handling routine in both cases.

Memory accesses for the CISC based exception handling example: 3 The PC and Processor Status Register are written onto a stack in memory. The address of the ISR is read from a table in memory into the PC.

Memory accesses for the RISC based exception handling example: None

3-4 You have hardware support for seven priority interrupt levels. On what basis would you assign these priorities?

It depends on the application area of the system and whether the priority assignment is static or dynamic. For a general purpose multi-user system consider whether devices are dedicated to individual users or shared. The latter should be given priority over the former.

Consider the throughput of the device. It is desirable to keep the disks busy. Especially so in file servers.

There are trade-offs in network handling. It may be possible for very high speed networks to deliver more data than a system can cope with, causing other tasks to run too slowly.

The clock need not have very high priority. It "ticks" relatively slowly compared with instruction execution time.

3-5 What is direct memory access (DMA)? How can a) a single block b) several blocks of data be transferred between main memory and a disk or network?

The processor initiates a DMA transfer but can execute unrelated instructions in parallel with it. An interrupt indicates the end of the transfer.

To initiate a DMA transfer, the disk address, memory address and amount to transfer are passed to the disk controller. It is then instructed to start the (read or write) transfer. This procedure copes with a single block or a number of contiguous blocks on the same track. Some disk interfaces (scatter/gather) are able to transfer to or from a number of non-contiguous blocks in memory to a contiguous area of disk.

3-6 Processors are usually designed to execute in one of two (or more) privilege states, for example, user and supervisor mode. When and how is the state change from user to supervisor mode made?

As part of exception processing.

When and how is the state change from supervisor to user mode made?

By using a special operation which is able to change the processor status as well as transfer control.

Which instructions would you expect to be privileged (executable only in supervisor mode)? What is the mechanism for preventing them from being executed in user mode?

Halt the processor, change memory management hardware, enable or disable interrupts, change the processor status, handle devices directly and so on.

A flag in the processor status indicates user or supervisor mode.

Privileged instructions executed in user mode cause an exception.

3-7 An application should be able to send an arbitrary amount of data to be output to a device. Devices transfer a fixed amount. How is this achieved?

Through the use of buffers. See Chapter 10 and exercise 5.

3-8 Error conditions are often detected at a low level in a system, such as in an exception handling routine. Why should they not be dealt with immediately? Devise a method to allow error handling at user level within the context of the application that caused the error.

Error handling at a low level is overhead on every process in the system. Errors should be handled in the time allocated to the process causing the error.

General purpose exception handling routines can be provided in libraries. Users may be able to supply special purpose ones. The low level error detection mechanism may set a flag (in the process descriptor-see Chapter 6) to indicate that on return to user level exception processing should be carried out instead of a continuation of normal processing.

3-9 How are exceptions handled in a shared memory multiprocessor?

The issue here is which processor should handle a given interrupt or exception. See Section 3.4.

3-10 Compare and contrast peripheral I/O and network I/O.

Network I/O is comparable with device I/O for shared devices.

Network and device interfaces are similar in function. Both may or may not use DMA.

In the case of shared devices, a single operating system controls all interactions with devices, even though data transfers may complete asynchronously. In the case of network I/O an independent computer system may initiate communication with you.

3-11 Define wide area networks and local area networks (see also Section 1.3.8).

The technical differences are becoming less clear as WANs increase in speed and LANs increase in complexity. A LAN may comprise different types of network, for example rings and Ethernet, perhaps connected by a high speed backbone network. Because of the smaller scale and high reliability, functions such as routing, flow control, error control and congestion control may be less complex in LANs than in WANs.

LANs are owned by a private organisation, WANs require PTT co-operation.

3-12 Compare and contrast the Ethernet with a ring-based LAN. Which design type will guarantee bandwidth between connected systems? What kind of applications need such guarantees? Are the guarantees usually made to the application level?

The Ethernet is probabilistic and guarantees of point to point bandwidth cannot be made. Guarantees can be made for some ring-based systems (e.g. those which employ ATM techniques). For example, the Cambridge Ring has an anti-hogging protocol so that a station must pass control on after sending a packet. Token rings may control the maximum amount that can be transmitted before passing on the token.

3-13 What is meant by connectionless and connection oriented communication?

Connection oriented communication has a separate phase for connection set up. This is usually an end-to-end procedure which imposes disproportionate overhead if only a small amount of data is to be transferred. After set-up the connection has a short identifier (compared with the name and address of the final recipient) and decoding and routing overhead is minimised. Resources may be allocated to the connection such as buffer space en-route. It is suitable for the transfer of a large amount of data and where a stream mode of communication is required. Connectionless communication allows a packet of information (a datagram) to be sent to a specified recipient without prior notification. The communication may be unreliable (unacknowledged) or reliable (acknowledged).

A given protocol hierarchy is likely to offer both styles of communication (e.g. UDP and TCP)

3-14 Which of the ISO layers would you expect to be implemented inside an operating system?

The traditional approach is to offer a transport service in the OS and a session service at the interface. Presentation issues are specific to the application.

Microkernel designs have experimented with implementing various functions outside for generality and inside for speed.

3-15 How do you think the ISO layers might be implemented in terms of the modules and processes introduced in Chapter 2? Try this question again after reading Chapter 4.

It appears that each layer has to be prepared to respond to service calls from above, to respond to unsolicited calls from below and to wait for responses to previous requests. At this stage it seems that we need several processes per layer.

After reading Chapter 4 it seems that several processes per layer would involve a large amount of context switching overhead which is highly undesirable in this area. After reading later chapters (8, 15) it seems that we should aim to have the communications service multi-threaded and allow a given thread to traverse the layers. We must have the ability to respond to unsolicited events, perhaps have a dedicated, listening thread and either a pool of available workers or dynamic creation of threads.

Chapter 4 Support for Processes

Exercises

4-1 Discuss how you would wish to allocate processes in the systems that were described in the introduction, for example:

an industrial process control system,

A process dedicated to each source of external event: each sensor, each actuator. One or more processes to handle network communication. If a separate process is used to perform calculations on the gathered data then a buffering scheme and synchronisation will be needed.

a multi-access (timesharing) computer,

Many arrangements are possible, see the UNIX case study for one example. One per active user (terminal line in the first instance). One to manage each independent device. One or more associated with memory management. One or more associated with any network connection. System services may have dedicated processes e.g. filing. Users may be allowed to create more processes dynamically.

A full answer should specify the function of each process and the points at which it must synchronise with client requests, device events, buffer access etc.

a powerful personal workstation,

The point to make here in addition to the previous discussion is that you don’t want to sit and wait while your document is paginated or your mail is fetched. You want to get on with many things in parallel.

a transaction processing system,

This is discussed in detail at the start of Chapter 27.

a parallel searching algorithm to run on a shared memory multiprocessor

One process per compiler phase. If there are fewer processors than phases the system’s scheduler will automatically carry out the organisation.

4-2 How would you simulate one processor per process in a uniprocessor and a shared memory multiprocessor?

By being able to multiplex processes on a processor transparently. The basic mechanism is to save the state a process has built up on a processor when the process must relinquish the processor. The state can be restored on the (or a) processor when it becomes available.

4-3 List everything that might be considered part of the state of a process. Which of these items must always be in main memory and which could be swapped out to backing store when the process was not running?

The items in Section 4.3 are not intended to be comprehensive for every possible system. use a local system’s process state descriptor as an example. See the UNIX and NT case studies.

It is essential to be able to record that an event has occurred or a signal has been sent to a process while a process is swapped out. Associated with this is the state change from blocked to runnable.

Other aspects of process state such as register contents and open files are only needed when the process is to run.

4-4 Design a method of implementing synchronisation between a peripheral and a single process which is dedicated to managing it.

One dedicated process - the process descriptor can be used. The interrupt service routine knows which process to inform that the event has occurred. it can be programmed to record the event and change the process state.

Design a method of implementing synchronisation between a peripheral and a process which is currently waiting for it to deliver data but is not dedicated to this task.

How does the interrupt service routine know where to record that the event has occurred? The peripheral may have taken work from a transaction record, indicating which process is awaiting completion of the task. The descriptor of this process could be used as above.

Design a method of implementing synchronisation between any hardware device and one or more processes.

Here we have the general case that any number of processes may be awaiting some event. The requirement may be that all should be made runnable and the first to be scheduled succeeds on obtaining, for example, a buffer made free by a transfer of its previous contents to disc.

We need event objects so that processes may indicate that they are awaiting some specific event and the event causer may signal the event.

4-5 Design a data structure which implements the representation of the processes known to an operating system. You should consider whether your system has a significant number of dedicated system processes and, if so, whether they will be held separately from user processes. What operations would you expect to provide at the interface of a process management module?

One possible structure was outlined in Section 4.5. Various alternatives are possible. Linked lists could be set up through a pool of descriptors. Separate lists could be held for blocked and runnable processes. Several run queues could be maintained either according to a fixed priority scheme, a dynamic priority scheme or according to recent behaviour, such as the most recent reason for blocking.

4-6 When does a general schedule need to be carried out in a multi-access system?

When the current process blocks.

How can a process in a multi-access system be scheduled differently when it is in a compute-bound phase from when it is doing input or output?

When it is in a compute-bound phase it will use up its time slice. It may then be scheduled round robin with other processes behaving similarly or a multilevel queue arrangement might be used. A lower priority queue has a longer time slice. If you use up your timeslice at one level you drop a level.

Section 4.6 sketched how priority could be associated with different reasons for blocking.

How many processes would you expect to be active in a single user workstation, a dedicated gateway computer and a dedicated file server?

One per separate user task, others for managing system resources.

One for each communication in progress (data from several separate communications might be arriving interleaved). Others to respond to new communications.

One for each client with work in progress. Others ready to respond to new clients. Others to manage storage devices.

How would you expect processes to be scheduled in a multimedia workstation, see Section 1.1?

The additional requirement is that media streams must be serviced periodically at rates appropriate for the particular medium. Synchronisation between different streams may be required (e.g. talking head). A fixed allocation of priorities with pre-emption is not suitable, for example, a video stream might be given high priority to achieve the required throughput. It is necessary for a voice stream to be given time regularly, even if it assigned a lower priority than video.

4-7 How does scheduling for real-time systems differ from scheduling for multi-access systems?

For many real time systems it is possible to work out a schedule in advance. For example, it may be known that a certain process is periodic with period 20 units of time and it must run for 4 units of the 20. Other processes are specified similarly. The normal working schedule can be precomputed. Alarm conditions cause special catastrophe procedures to be invoked.

4-8 What approaches can be taken to scheduling for shared memory multiprocessor systems?

This question will also be relevant later when we discuss the freeing of processes that are blocked on a semaphore or condition queue.

The basic issue is whether we should attempt to schedule co-operating processes at the same time on different processors, for example the threads of a process. Should the application be able to specify which threads should run together? Or should the kernel be kept simple and provide the least possible mechanism?

4-9 Recall the layered communications software described in Chapter 3. Consider the situation when several user processes have made requests for network input or output. How might the layers be executed by processes? How might synchronisation between user-level processes and the arrival or transmission of data at the network be arranged? Where might network buffers be located with respect to the ISO layers? Why would it be a bad idea to have a process execute each of the ISO layers?

A good paper to read on this general area is:

Clark D D, "The structuring of Systems Using Upcalls" ACM 10^th SOSP and Operating Systems Review 19(5), Dec 85.

The point is made in the paper that communication is initiated both top down and bottom up. Procedural invocation is often set up for top down invocation but not for bottom up - this is achieved by an asynchronous signal mechanism. The client should be given a choice between this and an upcall mechanism. Clark suggests that each layer could comprise a number of modules that can be invoked procedurally top down or bottom up.

This is an example of the need for a multi-threaded service, so that threads can work on behalf of different clients but can access shared system data.

Although each layer provides a service for higher layers it is not a good idea to dedicate processes to providing layer functions. This would involve too much context switching overhead.

We assume that a pool of buffers is available for use by applications and incoming data. In Chapter 6 we introduce the idea of mapping to avoid copying. A similar idea could be used here. An application could request some number of pages from a buffer pool which could be mapped into the process address space and mapped out again on transmission. We do not assume buffer areas associated with each layer - layers in general add or remove headers and possibly trailers.

4-10 For a given modular operating system structure, what is the minimum set of dedicated, resident system processes that can be used? How does the rest of the operating system get executed in this case?

It depends on the type of system. Assuming a general purpose system, there must be a process that is permanently resident in memory that can organise the bringing in and swapping out of other processes.

There must be a process that is ready to respond to the or a user coming on line.

How would you design to make use of a liberal number of dedicated system processes?

Associate processes with each system function and have services invoked by message passing.

For both of the approaches indicated above, discuss where a separate address space could be used for protection purposes. Assume that the system provides a mechanism for a process to make requests of the operating system. We have studied one such mechanism in Section 3.3.2. We shall expand on this in Part II.

The book does not emphasise hardware-enforced protection domains. This is a topic which might best be treated in a later course on special architectures. The trend has been towards software-enforced protection through type-checking in modular programming languages.

In outline, at one extreme a separate hardware-protected protection domain could be provided for every service. Architectures vary as to whether domains are considered to be self-standing or to be "in process". A domain switch would be needed on every service call. A capability architecture would be needed to support this.

Less general hardware support is achieved by the provision of rings of protection, for example, Multics was designed to have 64 but was built with 8. In this case, the OS and utilities may run "in-process" but may have varying degrees of protection associated with them. In a conventional system we have two rings or domains and it is common to provide the OS "in-process" as a single protected domain. With this model the OS occupies part of the address space of every process but is still protected.

An alternative approach is to use separate address spaces more liberally and to provide a mechanism for invoking a service in one address by a client occupying another. An OS could be provided in this way or could be subdivided into separate services.

4-11 In what circumstances might it be advantageous to use several threads of control within a single process?

If the process has several distinct tasks to perform or the same task on behalf of different clients.

4-12 Section 4.9 introduced a process management module and pointed out that, as this module implements the process abstraction, it cannot itself be implemented in terms of processes.

Within an operating system, the interface operations of the process management module may be called as simple procedures. In Section 4.4, for example, we saw the WAIT (event) operation invoking the BLOCK (process) operation.

In Section 4.10 it was mentioned that an operating system might be executed by a set of system processes taking users’ requests or might instead be executed "in process". For both of these models of execution, discuss how the invocation of process management can be incorporated into the model. (The problem to address is, if you are executing the operating system yourself, what happens when you block yourself?).

The question relates to the creation of an abstraction. We create the process abstraction, perhaps at the lowest layer of a system as in THE, or in a process management module. Above this level, or outside this module, all execution is carried out by processes. Below the level, or within the module, the process abstraction cannot be used. We have to contrive to execute the required code without the benefit of the abstraction.

4-13 What aspects of the state of a process are of concern to an operating system and a language system?

The OS is concerned with the allocation to a process of any resource that it manages. For example, the OS is concerned with the pages of physical memory that are allocated to a process. It is not concerned with the fine-grain representation of process state within those pages. That is the concern of the language run-time system.

4-14 Discuss how a sequential programming language can be used to implement a concurrent system. What assistance would you expect from a library package and operating system? What are the advantages and disadvantages of using a concurrent programming language?

Either separate tasks are programmed as separate programs executed by separate OS processes or the sequential language is augmented by library calls to support multiple threads of control within the program.

In the first case, it is necessary for communication between the components to be supported by the OS. This may mean that the system is not portable.

In the second case the underlying OS may be incapable of seeing more than one thread of control per program. We explore these issues throughout Part II.

4-15 What is the essential difference between co-routines and processes? If a concurrent program is to run on a uniprocessor machine, what advantages are there in using a language which supports processes? When might co-routines offer an advantage?

You program the scheduling of co-routines yourself; either the language system or the OS schedules processes. If you use co-routines you cannot respond to an event immediately. If you use language-level-only processes you still can’t. If you use language level processes known to the OS, and the OS supports pre-emptive scheduling and process priority you can arrange immediate response.

Co-routine switching incurs very little overhead. It might be that your application does not require immediate response to events; or it might be the case that pre-empting the processor on an interrupt is bad for some application areas; for example, a high speed network might interrupt the processor too often from work which has a deadline for completion.

Co-routines and language level processes run to voluntary suspension, so within a single OS process there can be no interference between them. They can maintain the consistency of data structures shared between them.

4-16 What are the potential problems of using a language-level "threads package" when the operating system sees only one process? Why might such a scheme be inappropriate for a shared memory multiprocessor?

Synchronous I/O (blocking system calls): the OS sees the whole process as blocked.

Response to events: the OS cannot schedule some specific thread when an event occurs.

The separate language level threads cannot run simultaneously on the processes of a multiprocessor.

4-17 What are essential requirements for real-time response to be achieved?

Note that real-time does not necessarily mean "very fast". It must be possible to bound the delay (that is, to know the maximum possible delay) between an event occurring and the associated (possibly user-level) process running.

Different application areas have different characteristics. Pre-emptive scheduling must be an option so that alarms can be serviced immediately they occur.

4-18 What is meant by a static specification of processes in a programming language? How would you expect a static approach to be reflected in the syntax of a language?

Processes are created at compile time rather than run time. You may be given syntax to create a "process" module or to bind a number of processes to a module.

How can dynamic process creation and deletion be supported in a concurrent programming language?

A common approach is to use a special kind of procedure call, often called "fork" which causes a child process to be created to run in parallel with the parent. The parent continues, the child executes the called procedure. On return from the procedure the child may explicitly or implicitly "join" with the parent, that is, be deleted.

How might a parent process determine properties of a child process it creates, such as its name or identifier? How might a parent process be given control over a child process once it has been created?

The child’s identifier may be returned to the parent when the child is created. The parent might be able to make system calls with this identifier as argument to request information on the child or control it.

You are working in an environment where application developers need to be able to use different scheduling algorithms. Which thread architecture would be most suitable and why?

Are there any disadvantages arising from using only kernel threads in an application?

Discuss the pros and cons of multiplexed threads compared with scheduler activation.

4-22 Discuss why soft real time requirements, such of those that involve dynamically arriving continuous media, might best be met with a combination of user-level and kernel level threads.

Additional exercises on Chapter 4

4-23 Discuss the relevance of the discussion of Section 4.6 on unary, binary and general scheduling to shared-memory multiprocessor systems.

Binary scheduling should be reconsidered in the context of multiprocessor scheduling. Suppose a process is running on each of the processors of a multiprocessor and an event occurs which frees some process. If the freed process has higher priority than the lowest priority running process it should pre-empt that process. (Assuming that we have pre-emptive scheduling and any process can run on any processor). If a running process makes another process runnable, the priority of that process should be compared with that of the lowest priority running process. If it is higher it should pre-empt that process.

4-24 Design the data structures that a kernel might maintain in order to manage threads.

Separate out the information that is specific to a single thread and that which is shared by the threads which belong to a given address space. The process descriptor shown in Figure 4.4 can be taken as a starting point. The exception address might be shared by all the threads of a process. otherwise a thread would need all the items mentioned. In addition, the process or address space identifier with which the thread is associated is needed. This would allow memory management information, open file information and so on to be shared by the threads of a process.

Design an OS system call interface for thread operations.

Note that threads will make use of the system calls for I/O etc that we have seen already. The additional system calls would allow a management thread of a process to control other threads. For example, a thread might wait on a user level semaphore and find it busy. The management thread should tell the OS that this thread can no longer run. We therefore need block (thread-id) and unblock (thread-id) calls. We also need a create call which returns a thread-id and a kill (thread-id) call. We may also need schedule and unscheduled in addition to create and kill or these may be combined with create and kill.

The system calls of Mach Chorus etc. may be consulted for a complete example specification.

Chapter 5 Fundamentals of Distributed Systems

has no exercises

Chapter 6 Memory Management

Exercises

6-1 Outline the basic memory management functions of an operating system.

To hold information so as to be in a position to respond to client requests for memory. To set up the memory management hardware so that address translation can take place and to respond to memory management hardware events. To keep track of allocated and free main memory and backing store. To manage the transfers between main memory and backing store.

6-2 Describe how copies of instructions and data are held in the various levels of a memory hierarchy: permanent disk storage, backing store for main memory on disk, main memory, cache memory and processor registers. When are transfers of data made between the various levels? In each case, indicate whether the transfer involved is controlled by hardware or the operating system and explain the interface between the hardware and the operating system.

Backing store for main memory: this may be a separate partition of a disk or a separate disk, sometimes with higher performance than those used for persistent storage (e.g. fixed head). Traditionally, when a new program is to be loaded it is set up in this backing store ready to be paged in. Note that this arrangement is not suitable for memory mapped files.

Main memory: transfer in on an MMU exception (e.g. page fault). Transfer out if the page has changed and main memory is needed for another page of this or another process (note memory sizes are increasing). Done by OS.

Instruction and data cache: transfer-in takes place when the instruction or data is accessed during instruction execution. Transfer out is to make room for a transfer in. Controlled by hardware.

Processor registers: by hardware during instruction execution.

6-3 What is the virtual address space of a process? How large is the virtual address space for address lengths of 16, 24 and 32 bits? How does a separate address space per process protect processes from each other and the operating system from processes?

The range of memory locations that can be addressed directly.

16 bit address: 64Kbytes

24 bit address: 16Mbytes

32 bit address: 4Gbytes

The virtual address space of a process defines the addresses which are available to a process. If processes have separate address spaces they are unable to address each others memory for reading or writing. If the operating system is run in its own, separate address space no user level process can read or write OS memory.

6-4 Give examples of why processes might wish to share parts of their address spaces. What kind of memory management hardware would support this sharing? How can memory protection be enforced when sharing is allowed?

Code sharing (e.g. of utilities or libraries) or read-only data sharing is transparent to the processes concerned. It allows the system to economise on the use of physical memory by avoiding multiple copies. Segmentation hardware (or paging hardware with segmentation managed by the OS) is needed with write protection on the shared code or data.

Sharing of writeable data areas allows processes to co-operate efficiently. They may wish to share only a portion of their address spaces. Fine grained segmentation hardware (many segments per process) would support this. If only paging hardware is available the OS may allow the processes to declare shareable regions of their data spaces and set up the shared pages accordingly.

6-5 What are the advantages and disadvantages of running the operating system in the address space of every process? What are the advantages and disadvantages of running the operating system as a separate process with its own address space?

OS and user-level process in same address space: OS may be entered by procedure call with mode switch. No switch of memory management tables is needed on entry or exit. OS may access user memory conveniently, e.g. to transfer data on I/O. Must have memory protection on OS memory. The architecture may support this arrangement.

If the OS is a separate process it is invoked is a uniform way as for other processes. Data transfer is like IPC (may use memory mapping). Context switching is needed on call and return. Memory protection is automatic cf. inter-process. Matches a message passing rather than procedural invocation model.

6-6 Why do systems which use segmented memory management have to solve the problem of fragmentation of main memory? Why do segmentation and swapping go together?

Segments are variable in size. If we use segmentation without paging, memory is allocated in variable sized contiguous areas leading to fragmentation. Swapping out then in again is more efficient for consolidating free store than memory to memory transfer. It can also be combined with scheduling policies. All this becomes less important as memory becomes larger.

6-7 Is segmentation transparent to the application programmer and/or the language implementor? What service is segmentation hardware designed to provide?

The application is able to indicate the structure of its virtual address space (via the language system implementation to the OS) so that appropriate protection and sharing may be arranged. If many segments per process are available the application may need to be able to specify fine-grained sharing. Segmentation may be transparent to the application programmer and be managed by the language system implementation (e.g. shareable code, non-shareable data segments). It is not transparent to the language implementor.

Is paging transparent? What problem is paging hardware designed to solve?

Paging is transparent and is for efficient physical storage management.

6-8 How can a large data structure which is in the address space of one process be made shareable by some other process so that they both see the same copy of the data?

By making it a segment of both processes. System calls may be provided to achieve this (e.g. UNIX System V).

How can a large data structure be transferred from the address space of one process to that of another without copying the data?

The system may provide facilities to achieve this - see message passing in Chapter 13. If a transfer is required (as opposed to simultaneous sharing) the pages occupied by the data area could be mapped out of one process’s address space and into the other’s.

How can a copy of a large data structure which is in the address space of one process be given to another process? If it is unlikely that both processes will write to the same part of the data structure how could the system avoid giving them a copy each?

Again, the data structure may be sent as a message with an indication that sharing, not transfer, is required. The pages of the data structure may be set to have "copy on write" access rights for both processes, see Section 6.9.2.

6-9 How many entries would there be in a process page table in a system with 32-bit addresses and a 1K page size?

2*22, (4096K or over 4 million).

How might the processes page tables be organised to avoid a large number of page table entries for a sparsely populated virtual address space?

The language system and OS co-operate in the organisation of soft segmentation. Instead of a flat page table, the OS maintains a number of segment or region page tables.

It has been suggested that a main store page table (sometimes called an inverted page table) might be used, instead of a page table per process, for pages in main store. What are the advantages and disadvantages of this approach?

Physical memory is typically smaller than virtual memory. It is feasible to store a single main store page table (MSPT) in main memory whereas process page tables might need to be swapped out, for example when a process is blocked. The MSPT scales with the size of physical, not virtual, memory. The OS must record both main store and backing store locations of pages of processes by some means. A suggestion is a single main store page table and a backing store page table per process.

The disadvantage is how to organise sharing. How do you indicate that a given page is shared by a number of processes, possibly with different access rights. Note that an associative store is still used for address translation. Each sharer has a separate entry there and that is where access checking is carried out during instruction execution.

Chapter 7 File Management

Exercises

7-1 Which component computers of what types of system would you expect to have local disks with associated disk management software and filing system software?

Examples are: any system which is not networked ranging from a PC to a multi-user system. A single user, networked workstation may have local disks or be diskless . Shared systems, similarly, may have local disks and also be augmented by shared fileservers. File servers are dedicated to this purpose.

7-2 What are the advantages and disadvantages of providing filing system software inside and outside the operating system?

The filing system is large and self-contained. It is not needed in every component of some systems (e.g. diskless workstations with remote file service). A small kernel is easier to develop and maintain. Also, providing the filing system as a service above the OS allows substitution of alternative filing systems tuned to meet the needs of applications.

There is more overhead involved in invoking a user-level service (IPC -Vs-system call, non-privileged service must make calls for privileged actions).

7-3 Explain how a directory may be considered as an abstract data type or object.

Although a user may own a directory he/she may not read or write it arbitrarily (even though we are told it is stored as a type of file). It must be used via the interface operations. The directory service functions as a directory type manager and maintains the integrity of the directory structure.

7-4 How might the modules and processes introduced in Chapter 2 be used to implement a filing system? Chapters 4, 7 and 8 will give a full explanation, although enough ideas have been presented in this chapter for a first attempt to be made.

At this stage we know that a number of clients may have requests in progress at the same time. Any one might be delayed for a DMA transfer. We have seen data structures in the filing system implementation that need to be read and/or written as part of satisfying a client request. We suggest a number of processes are able to provide the filing service function, each able to access the shared system data.

The modular structure at a coarse grain comprises a directory service and a file storage service. The latter may have a low level component which is a block storage service. At this level we may have a separate buffer management module. Below the storage service we have the disk device drivers.

7-5 Is a multi-level filing system such as a tree structure essential or just convenient in a multi-user system? How might a two-level system be used?

Many small or early systems have not had a full hierarchical structure.

CP/M has a single directory with the metadata on a file held in the file’s entry in this directory.

A two level arrangement could comprise a master file directory with an entry for each user’s user file directory (used in the early Tops10 OS for the Dec10).

The inconvenience of this kind of arrangement has become unnecessary as processor speed, memory and disk sizes have increased.

7-6 How can file sharing be supported in a filing system? What additional problems over normal file sharing are introduced if directory sharing is also allowed?

By access rights and by hard or symbolic links.

If hard links may be made to directories then existence control becomes more difficult than simply keeping a reference count with each file. The file material cannot be removed on a delete and we have to garbage collect periodically.

7-7 In what ways might a filing system provide concurrency control; that is, control of the simultaneous use of a shared file or directory? What are the advantages and disadvantages of the methods you suggest?

Assume that there is an open operation. There may be a policy to enforce single writer, multiple reader exclusion on open. In this case, you specify that you want to open the file for reading only or for reading and writing. This could be made more general and flexible by allowing you to specify shared or exclusive mode.

Concurrency control therefore takes place on open and each subsequent request must be checked as in accordance with the mode of use specified on open. An alternative is that there might be a separate lock service.

7-8 What are the functions of the directory service level and the file storage service level of a filing system?

DS: Maintaining the naming graph. Pathname resolution, access checking and possibly concurrency control checking on open. Handling any specific directory operation or any file operation with a pathname as argument (as opposed to a file "handle" or UFID for an open file). Existence control if this requires that the naming graph is traversed to detect unreachable objects.

FS: Managing storage allocation on disk. Allocation of disk blocks to files from free storage. Location of the disk blocks required to satisfy a given file-read or write request. Management of concurrency control - the FS maintains the data structures on open files.

7-9 File protection is achieved by a combination of authentication and access control (authorisation). Describe different kinds of access control policies that you might wish to implement. Discuss how these policies could be implemented.

Self, group and rest is commonly used and easy to implement but crude.

One may wish to have: a list of individuals and groups (with recursive evaluation of names in groups); a group with certain members excluded or with different access rights, access by anyone who is running some program, and so on.

Multics allowed lists to be specified flexibly (with groups and exclusions), Titan allowed lists (with groups and exclusions)+ access by program. The UNIX set-uid is simple to implement and allows access by program, see Chapter 23.

Once access by program is possible, then arbitrarily fine-grained access controls can be enforced dynamically. The program may interrogate you or determine the time of day, location of the terminal etc.

An example from Titan:

rights: F(full) R(read) D(delete) W(write) C(copy) N(none)

4 categories specified for users: Owner, Colleague, Friend, Anyone.

A directory entry contains, e.g.

FRRN % general access rights for categories

NFNN if user=jmb % check if jmb is a Colleague, if so F rights

FWWW if program=P % rights if running program P

FFNN if key-quoted=X % a colleague has full rights

% if she quotes the password X

7-10 Contrast the access control list approach to file protection with the capability approach.

~~FIGURE OMITTED~~

The figure gives a general context for discussing the two approaches. Instead of keeping a large, sparse access matrix, access to different types of object in a system may be managed either through ACL’s or through capabilities as appropriate.

A general and flexible ACL (see question 9) takes time to process.

Checking a capability is relatively efficient and constant in time.

Revoking traditional capabilities is difficult. Several solutions exist.

Could both methods be combined in a filing system?

Yes. The DS interface could use an ACL approach on open. The FS interface (particularly if remote from the DS) could be capability based. The DS translates from a pathname to a capability for a file.

7-11 Sketch an implementation of symbolic links held as separate (small) files. Outline the corresponding directory implementation.

~~FIGURE OMITTED~~

This is done in some variants of UNIX. It is useful to have a small as well as large block size available.

7-12 In some systems a file must be opened before it is used and the filing system maintains a context for open files; other filing systems are "stateless". What are the advantages and disadvantages of these approaches?

We assume a remote file server with many clients. If the server crashes the clients continue to work. A stateless server may offer a service immediately without trying to recover state by interrogating its clients.

A stateless file server cannot offer a concurrency control service. There is no notion of a file open for shared or exclusive use. It cannot assist in cache coherence if many clients have a file cached in their workstations.

7-13 It has been proposed that an immutable filing system should be used. Once a file is named by the filing system it may not be changed. A new file name must be generated if a change must be made to a file. Directories are still to be mutable. What are the advantages and disadvantages of this approach?

The advantage is that concurrent write access to a file version can go ahead without delay. The resulting updates will cause new versions to be created.

Immutability does not cause any problems to vanish. Concurrency problems re-emerge as version control problems. Can different versions be merged? How does one know whether one has the most recent version?

Note that if directories were also immutable we should create a new superior hierarchy every time we made an update to a file.

7-14 Filing systems often attempt to hold as much current data and metadata as possible in main memory. The advantage is to be able to satisfy clients’ read and write requests faster than if a disk had to be read or written. What are the possible dangers of this approach? Which application areas could not tolerate it?

There is no danger in holding a read cache. If you guess right you save a lot of time. The problem on holding a write cache in volatile main memory is that you might lose it all on a crash.

Some systems keep a read cache but not a write cache, others write metadata synchronously but delay writes of data, others use non-volatile RAM.

7-15 Take the file service operations given in Section 7.4 and work out a modular design with file service interface operations and nested modules for directory object management and file object management.

To do a complete solution to this exercise would be a huge project

the design of a filing system. The idea is to sketch the start of an object oriented design and to show how the file service interface operations invoke lower level services and objects. Some obvious simplifying assumptions are that we make the service single threaded and ignore crash resilience i.e. allow operations to be non-atomic. Let us also assume that a byte sub-range is specified explicitly rather than relative to a pointer.

The client interface to the whole filing system contains the operations:

a) those with pathname arguments:

create directory create file

delete directory delete file

list directory contents open file

set access rights to file or directory

link

b) those with UFID arguments:

read file write file close file

The interface to a directory object manager contains:

create directory % and enter in naming graph

delete directory % and remove from naming graph

lookup name in directory % for pathname resolution, returns an SFID

add entry to directory % invoked on create file or directory

remove entry from directory % invoked on delete file or directory

The interface to a file object manager contains:

create file % (assume the DS has managed the naming graph)

% create a metadata table entry, possibly allocate some storage, return a SFID.

delete file % (assumption as on create), remove metadata table entry,

de-allocate storage.

read file % return data

write file % write and acknowledge

At this level, the file object manager maintains a metadata table and can convert from a file identifier to a metadata table entry. From the metadata table entry it may determine the required disk pages within a filing system.

A lower level block object manager may be invoked to give or take the specified bytes. It, in turn, may invoke a cache manager to determine whether the desired block is in main memory in the cache or whether a request must be made to the appropriate disk driver.

Students could be asked to trace the invocation path through these object managers for each system call.

A paper from 1969 gives a basic layered design in detail:

Madnick S L and Alsop J W "A Modular Approach to File System Design" Proc AFIPS 1969 SJCC 34:1-12, (reprinted in Freeman P "Software Systems Principles, A Survey" SRA 1975).

7.16 For each of the allocation algorithms described in Section 7.5.2 work out the best case and worst case access costs for a given block n of a file in terms of disk and memory reads.

Additional exercises on Chapter 7 & Filing Systems

7-17 We have seen that metadata is held for each file and that the location of the blocks of the file on disk may be found via the metadata. Discuss the following alternative ways in which storage allocation might be organised and recorded. Discuss the suitability of the methods for various types of filing system.

a) Chaining: The metadata contains a pointer to the first block of the file. That block contains a pointer to the next, and so on. In addition, each block may identify itself as block x of file y to aid crash recovery.

b) The metadata contains a table of pointers to blocks of the file. To prevent the table becoming too large for large files, only the first few pointers point directly to blocks of the file. After that come pointers to blocks which contain pointers to blocks of the file and so on. An example is given in Section 23.6 and exercise 23-3.

c) The metadata contains an "extent list" for the file. An extent records the start-block and end-block of a contiguously allocated range.

d) A separate "file map" data structure is maintained for each filing system. The map contains an entry for each block of the range of blocks the filing system may allocate. Each file is represented by a chain through the file map. The metadata points to the first entry for that file (the first block of the file). The rest may be found by following the chain through the map.

Chaining has been used for PC systems such as MS-DOS and single user workstations such as Pilot. OK for sequential access only.

Tables of pointers with some direct pointers and some indirect are suitable for environments with a large number of small files and some large files e.g. campus systems. Not especially suitable for environments where files are usually large e.g. DBMS or multimedia (such as large voice or video files) where equally rapid access is required to all parts of a file.

A disadvantage of extent based allocation is its visibility to the user. There is likely to be a maximum number of extents because of metadata design. Above this number the system may have to consolidate by copying. It is a good method for systems where large and small files coexist and very large files are likely to have contiguous allocation.

A file map was used in the Titan OS. The method fell out of use as filing systems became large and main memory not so large that it could easily contain the data structure. It might be suitable for systems with a single large block size.

7-18 The choice of block size might also be discussed. Is there a single block size for the whole system or might it vary with the application domain of the system or the file type? A large block size gives an efficient unit of transfer but an inefficient use of storage if there are many small files such as those containing symbolic links.

Examples are that the UNIX BSD4.1 file system used only a 1Kbyte block size. BSD4.2 used two sizes: a block size and a fragment size. All the blocks of the file except the last are of some size fixed during configuration and depending on the intended use of the system, say 4Kbytes. The last block may be of some minimum size say 512 bytes or multiples of this minimum up to the standard block size. This can lead to a great deal of copying when a new file is being written from the beginning.

7-19 The general topic of measuring file system performance and organising disk storage to achieve high performance is not discussed in the main text. Performance optimisation can be taken as a specialist topic at a later stage.

A suitable topic at this stage is to discuss the idea of partitioning the disk into cylinder groups and partitioning the filing system so that the metadata and data for a file are allocated to the same cylinder group. The idea is used in Berkeley UNIX and aims to minimise long seeks.

We mention the likelihood of an I/O bottleneck as processor speeds continue to increase. As memory size increases, clients will be able to cache more files and write operations will dominate reads at file servers.

A popular approach is to organise writes to disk in a log structure; that is, instead of moving the heads to update files in place, all updates are appended to a log. The updates may be consolidated with the corresponding files from the log at times of light load.

A similar approach is to locate the function of acquiring free-blocks for writes at the lowest level in the filing system. It can be arranged that free blocks are scattered uniformly across the disk and a free block near to the heads can be acquired when needed. [Private communication from John Wilkes of HP Research Laboratories, Palo Alto].

Proc. USENIX workshop on File Systems, Ann Arbor, Michigan, 21-22 May 1992 is suitable for browsing for issues of current interest in file system design.

PART II SINGLE CONCURRENT ACTIONS

Chapter 8 System Structure

Exercises

8-1 Why is a single process inadequate to implement a dedicated service such as a file server?

In practice it may be necessary to work on several client requests at the same time. A single process would have to implement state saving and restoring on behalf of the clients, as one became blocked for I/O and another was to run.

Why are multiple processes each in a separate address space inconvenient for such an implementation?

Each process needs to access the common filing system data structures.

For what kinds of concurrent system would you expect to use a separate address space for each process?

When they are to run on the same machine: where interaction between processes is not required or is infrequent. Also where there is a requirement for protection between them. When they are to run on different machines they must run in separate address spaces.

To what extent can the memory management techniques studied in Chapter 6 be used to achieve the kind of data sharing that is needed in the implementation of a service?

If the process model of the OS is one process per address space and segmentation is supported, it could be arranged that separate processes each run the file service code but that they are able to share some of their data segments. The problem of which process should respond to which client would also have to be solved. This is essentially a naming problem. How is the filing system named when it is invoked? It might be that an interface process could take all requests and pass them on to free worker processes. A great deal of context switching overhead is involved if each process is to run in a separate address space. A multi-threading arrangement is preferable.

8-2 Consider the file service implementation given in Chapter 7. Which data structures must be accessed on behalf of every client of the service? Which data is private to an individual client? Could the memory management techniques studied in Chapter 6 be used to achieve shared service code, some shared data and some private data? (Consider the programming language storage allocation scheme outlined in Chapter 4. Assume that each process has a separate stack and that they share a heap.)

The shared data structures from the suggested implementation in Chapter 7 are the metadata table, the tables which hold information on open files, a disk block cache and/or buffer pool in main memory. Private data is that which relates to a particular client. In practice, a given file may be open for shared use, typically shared reading but sometimes for co-operative write sharing. Although each process has a private control stack, the bulk of the data structures need to be shared.

8-3 Give examples of interactions between processes that require one-to-one, many-to-one, one-to-many and many-to-many interactions.

For each example other than one-to-one, consider whether you could use a sequence of separate one-to-one interactions or whether you genuinely need an interaction involving many processes simultaneously.

Some examples are given in Section 8.6 to justify the classification.

Many distributed algorithms use a one to many pattern of communication. The distributed mutual exclusion algorithm in the appendix is one example. The two-phase commit and validation protocols of Chapter 20 are others. At this stage the point that a distributed agreement may be required is sufficient. The argument against a succession of one to one communications is the time taken, particularly if each message must receive a reply (acknowledgement) before the next can be sent as might be the case in RPC.

A debugger for a distributed system may wish to halt all processes as quickly as possible when a breakpoint occurs.

Many to one and many to many are set up in the text as naming requirements rather than a requirement for simultaneous communication.

8-4 Give examples of problems that can arise from uncontrolled access by concurrent processes to shared, writeable data.

The example given in the text is that an aeroplane seat, theatre seat, etc. could be double booked.

Another example is a chemical plant controller which is managing the production and packaging of some chemical. A process or processes are monitoring production and incrementing a variable amount. Another process reads amount periodically and when there is sufficient material, arranges for it to be removed for packaging then resets amount to zero and so on. Also, a process reports the value of amount to a management information system periodically. Various interleaving of the actions of the processes can cause errors. Records of small units of production can be lost between a test and clear, for example.

Chapter 9 Low Level Synchronisation Primitives: Implementation

Exercises

9-1 Support for synchronisation between processes and the hardware was described in Section 4.4. Concurrent processes also need to synchronise with each other. Discuss the approach of extending the methods described for process-hardware synchronisation to encompass process-process synchronisation. What kind of interactions could, and could not, be supported by the designs you devise?

Section 9.1 proceeds along these lines. This question asks the students to take the same starting point. Other designs than the one presented may be suggested.

9-2 Examine the processor handbooks of your local machines. Explore any composite instruction definitions with a view to using them to implement semaphore operations, in particular, WAIT(semaphore).

You may find composite instructions in the instruction sets, such as:

TAS (test and set a variable)

INC ( increment a value and set a condition code register)

CAS Register1, Register2, Memory

a) Write the entry and exit protocols for a critical region using each of the above instructions.

b) Show how the instructions may be used to achieve condition synchronisation so that a process may delay until a condition becomes true.

You may assume that processes co-operate; that some other process which exits the critical region or makes the desired condition true will take action to inform the delayed process.

The protocols should be presented at the machine instruction level so that the instructions involved and possible interleaving can be seen clearly.

TAS is discussed in the main text.

The VAX processor handbook gives seven instructions which may be used for this purpose: BBSSI, BBCCI, ADAWI, INSQHI, INSQTI, REMQHI, REMQTI.

Hewlett Packard’s PA-RISC provides a "load and clear" instruction. You can load a value into a register then test it there. If you find the value is already zero, someone else has the lock; if non-zero you acquired the lock by setting it to zero. The exit protocol is simply to make the value non-zero again.

Consider the use of a simple compare and swap instruction. The instructions of the specific machine should be substituted in the following. my-value and shared-lock may take the values locked or free.

entry: my-value := locked;

CAS (shared-lock, my-value);

branch on equal to entry;

in-CR: [critical region]

exit: shared-lock := free;

9-3 The SIGNAL and WAIT operations, provided as primitives by a kernel, are defined to be atomic. Discuss how this atomicity can be implemented. Consider the following cases:

hardware which provides a test and set or other composite instruction,

a uniprocessor, both when a composite instruction is available and when none is available

a multiprocessor with no composite instruction.

This is a review question to reinforce this important topic. A discussion of the first two points is given in Chapter 9. Two mutual exclusion protocols are given in the appendix. The DEC Alpha architecture provides hardware support for the atomic setting of a control variable which is not via a composite instruction but through special read (with "lock") and (conditional) write instructions, see Section 9.2.2.

9.4 - 9.7 from old appendix see CSbook/INDEX

Chapter 10 Low Level Primitives: Use in Systems and Languages

Exercises

10.1 - 10.6 are on the case study (from edition 1 appendix)

10-7 What are the problems of designing a system with a strict hierarchical structure? What are the advantages?

Each level in the hierarchy provides a service interface to higher levels. The model is of downward request and upward reply. It is difficult to choose a strict hierarchical ordering of functions. You may impose arbitrary restrictions by choosing some particular ordering as in THE.

"Haberman A. N. et al. "Modularisation and Hierarchy in a Family of Operating Systems" Comm. ACM 19(5) May 76 discusses the issues and suggests how the problems might be avoided.

The Venus operating system (Liskov, 1972) had the following levels (bottom up):

0: hardware,

1: instruction interpreter,

2: CPU scheduling,

3: I/O channels,

4: virtual memory,

5: device drivers and schedulers,

6: user processes.

Does this choice of layers solve the problems discussed for THE in Section 10.2? See Liskov 1972.

10-8 A buffer-object manager is to be implemented. Buffers are to be managed as a pool of fixed sized objects. A producer process first acquires an empty buffer, then performs a number of write operations into it until it is full. A consumer process first acquires a full buffer then performs a number of reads from it until it is empty.

Each object has a header part, for holding information on the use of the buffer (such as a count and a pointer), for links so that the buffers can be chained, etc. and a body part for holding data. A chain of empty buffer objects and a chain of full buffer objects are to be maintained. The object manager detects when a buffer becomes full.

Interface operations are proposed as follows:

buff-id := acquire-empty-buffer () % executed by producers

buff-id := acquire-full-buffer ( ) % executed by consumers

return-code := write-buffer (buff-id,bytes) % executed by producers

bytes := read-buffer (buff-id,byte-count) % executed by consumers

free-buffer (buff-id) % executed by consumers

Discuss the following:

use of the proposed interface operations and possible modifications,

the information that might usefully be kept in each buffer header,

how a full buffer, on writing, and an empty buffer, on reading,

should be detected and indicated to the invoker,

how exclusive access to a buffer-object might be ensured,

how the various chains might be updated correctly, secure from

concurrent access,

how the scheme compares with a cyclic buffer (see Section 10.3) for concurrent accesses by producers and consumers.

This type of arrangement can be used for terminal I/O. UNIX, for example, has c-blocks as buffers in a c-list data structure. Details can be found in Bach M. J. (1986), p331-334 and Chapter 9 of Leffler S J et al. (1989).

The appendix of this book sets up a buffer pool for disk blocks and discusses the problems of concurrent access to the shared data structures set up to manage the pool.

10-9 The classic sleeping barber problem is given as exercise 9 of Chapter 11. Solve this problem using semaphores.

waiting : integer :=0; %customers waiting to be cut

guard : semaphore :=1; % to delimit a critical region to protect waiting

customers : semaphore:= 0; %counting semaphore of customers barber : semaphore :=0; %is barber waiting for a customer (1) or not (0)

the barber executes the following program:

WAIT(customers); %sleeps if there are none WAIT (guard);

waiting := waiting - 1; %otherwise, changes waiting under exclusion

SIGNAL(barber); % and indicates his readiness to cut hair

SIGNAL (guard);

cut hair

a customer executes:

WAIT (guard); %test and set waiting under exclusion

if waiting < chairs % if there is a free chair to sit and wait

then { waiting := waiting+1;

SIGNAL(customers) %indicate one more customer

SIGNAL(guard) %release the exclusion on waiting

WAIT(barber); %use the barber resource

have haircut; }

else SIGNAL(guard); % if there are no free chairs just release

the exclusion on waiting and go away.

10-10 The classic dining philosophers problem is described in Section 17.5 and Exercise 17,5. Attempt the suggested solutions using semaphores.

A number of solutions are given with the solutions to Chapter 17’s exercises as part of a small project exploring this problem.

Chapter 11 Language Primitives for Shared Memory

Exercises

11-1 How would a compiler use semaphores to implement critical regions in a concurrent programming language?

Create a semaphore for each data structure declared to be shared.

At the start of a region ( region shared-data begin )

use WAIT (semaphore-for-that-shared-data).

At the end use SIGNAL (semaphore-for-that-shared-data).

11-2 Why is it inefficient to implement condition synchronisation within a critical region using an expression involving programming language variables?

The programmer may AWAIT an arbitrary expression achieving some value. We assume that the variables involved in an AWAIT statement are those accessed under exclusion in the critical region. The programmer is not required to SIGNAL when a condition is made true, so the language implementor must arrange to test all conditions whenever a process exits from a critical region in case a condition has been made true by the exiting process.

In contrast, in monitors which use condition variables, the programmer must declare the conditions on which processes may WAIT and explicit SIGNALs are required, using WAIT(condition-variable), SIGNAL(condition-variable). Guarded commands (Dijkstra, 1975) showed the way to a higher level approach than condition variables.

11-3 In (Hoare, 1974) it was proposed that the scheduling of processes waiting on a condition variable in a monitor could be priority based rather than just first come first served - a "scheduled wait". Syntax such as WAIT(condition-name, priority) could be used to indicate the ordering of waiting processes on a condition queue. Using this construct, write an alarm clock monitor with procedures wakeme(n:integer) and tick. The tick procedure is invoked on a timer interrupt at regular intervals to maintain a value of system time. The wakeme procedure is to allow a process to request that it should be blocked for a number of units of time and then woken up again. The process "priority" is to be used to order the time at which the process is awakened.

The solution from Hoare’s paper, Comm. ACM 17(4) Oct 74 is:

alarmclock: monitor

begin now : integer :=0;

wakeup : condition;

procedure wakeme (n: integer);

begin alarmsetting: integer;

alarmsetting :=now+n;

while now<alarmsetting do WAIT (wakeup, alarmsetting);

SIGNAL (wakeup); % in case the next process is due to wake up

% at the same time end;

procedure tick;

begin now := now+1;

SIGNAL (wakeup)

end; end alarmclock:

11-4 The process priority described above for exercise 3 can also be used to wake up processes in a suitable order for writing to the cylinders of a disk. Assume that a sweep or "elevator" algorithm is to be used to control the disk heads: that is, the heads are to move across the surface to the outermost required cylinder in one direction then are to move back in the opposite direction. The heads sweep smoothly across the surfaces and back, stopping at cylinders en-route for processes to make data transfers. Write a monitor with a procedure to request that the heads be moved to a given cylinder and to block the invoking process until its required cylinder is reached, and a procedure that is called by a process after it has finished making a data transfer to a given cylinder.

The solution from Hoare’s paper, Comm. ACM 17(4) Oct 74 is given below. We have type cylinder = 0 ..cylmax;

The solution also uses QUEUE (condition-name) which yields true if any process is waiting on the queue and false otherwise. A solution may be programmed to use simple counts instead, as in Chapter 11, because managing a count and performing a SIGNAL or WAIT are atomic within a monitor procedure.

diskhead: monitor

begin headpos: cylinder := 0;

direction: (up, down) := up;

busy: boolean := false;

upsweep, downsweep : condition;

procedure request (dest : cylinder);

begin if busy then

{if headpos<dest or headpos=dest and direction = up

then WAIT (upsweep, dest) else WAIT (downsweep, cylmax-dest) };

busy := true; headpos := dest end request;

procedure release;

begin busy := false;

if direction = up then

{if QUEUE (upsweep)

then SIGNAL (upsweep)

else { direction := down; SIGNAL (downsweep) }

}

else if QUEUE (downsweep)

then SIGNAL (downsweep)

else {direction := up; SIGNAL (upsweep) }

end release; end diskhead;

11-5 Rewrite the "readers and writers" monitor operations to give priority to waiting readers over waiting writers. The application is such that it is better to read stale data quickly than to wait for up-to-date data.

read-write: monitor

entry-procedures startread, endread, startwrite, endwrite

var ar : integer % we now need to know if there are active readers

busy-writing : boolean;

free-to-read, free-to-write : condition;

% in startread a reader now waits for a writing writer tofinish

procedure startread ()

begin ar := ar +1;

if busy-writing then WAIT (free-to-read);

SIGNAL (free-to-read) % If one reader can read, all can read. Each

end startread; % reader wakes up another until none is left.

% in endread the last reader signals a writer

procedure endread ()

begin ar := ar-1;

if ar = 0 then SIGNAL(free-to-write) end endread;

% in startwrite a writer now waits for a writing writer to finish or for no waiting readers

procedure startwrite ()

begin if busy-writing or ar > 0 then WAIT (free-to-write);

busy-writing := true

end startwrite;

% in endwrite any waiting reader is now given priority over any waiting writer

procedure endwrite ()

begin busy-writing := false;

if ar>0 then SIGNAL(free-to-read)

else SIGNAL (free-to-write)

end endwrite;

end read-write;

11-6 Why is a monitor inadequate for managing many shared objects of the same type? How could mutually exclusive access by processes to the same data object be provided?

The mechanism for achieving exclusive access to shared data used in a monitor is to enforce that only one process at a time may be active within the monitor. That is, the exclusion is associated with executing the code of the operations through which the data object is accessed. We have seen a possible implementation in terms of semaphores. Each operation starts with a WAIT on the single semaphore which guards the data encapsulated by the monitor and ends with a SIGNAL on that semaphore.

A semaphore could be associated with each data object of the shared type encapsulated by the "monitor". Each operation must have an argument which specifies which object of that type is to be invoked. The operation includes a WAIT on the appropriate semaphore. Simultaneous accesses to different objects can now go ahead in parallel but each object is accessed under exclusion.

11-7 Why is "synchronisation at the level of operations" desirable?

How might this approach be supported in a concurrent programming language? Consider the cases of both passive modules (or objects) and active modules (or objects).

Condition synchronisation within the operations of a monitor is low level to code and (comparable with semaphores) is difficult to program correctly. Also, the programmer is relied on to maintain the invariants on the state of the shared data whenever a WAIT is executed. There are also problems associated with the semantics of a SIGNAL within a monitor operation - which process should run after a SIGNAL?

Synchronisation at the level of operations avoids all of these problems. Path expressions provide a syntax for declaring (statically) the order in which operations may be executed and which operations may run in parallel. Guarded commands provide an approach through which a server process may determine dynamically which of its operations it may execute on behalf of its clients. Several concurrent programming languages are based on them (Ada, DP).

11-8 Discuss how dynamic process creation can be used to avoid unnecessary delay in a monitor based system.

A process which calls a monitor procedure risks delay. If the process has work it can do in parallel with the execution of the monitor procedure it may create a child process to make the call and synchronise with it to pick up the results of the call.

11-9 The sleeping barber problem.

(We assume that the barber and his customers are male).

A barber provides a hair-cutting service. He has a shop with two doors, an entrance and an exit. He spends his time serving customers one-at-a-time. When none are in the shop, the barber sleeps in the barber’s chair.

When a customer arrives and finds the barber sleeping, the customer awakens the barber and sits in the barber’s chair to receive his haircut. After the cut is done, the barber sees the customer out through the exit door.

If the barber is busy when a customer arrives, the customer waits in one of the chairs provided for the purpose. If they are all occupied he goes away.

After serving a customer the barber looks to see if any are waiting and if so proceeds to serve one of them. Otherwise, he sleeps again in his chair.

Write a solution to the problem for the barber process and the customer processes.

A solution along the lines of the semaphore solution given for exercise 9-6 is as follows (note the translation from semaphore program to monitor program):

barbershop: monitor

waiting : integer := 0; % customers waiting to be cut

customers : condition :=0; % for the barber to wait for a customer

barber : condition := 0; % for a customer to wait for the barber

procedure seek-customer( ) % called by the barber

begin if waiting=0 then WAIT (customers); % sleeps if there are none

waiting := waiting-1; % one less customer waiting

SIGNAL (barber); % frees a waiting customer

end seek-customer ;

procedure get-haircut( ) % called by a customer

begin if waiting < chairs % is there a free chair to sit and wait?

% if there are no free chairs just go away

then { waiting := waiting+1; % one more customer waiting

SIGNAL (customers) % in case the barber is asleep

WAIT (barber); % wait for your turn with the barber

}

end get-haircut;

end barbershop;

Additional exercises

11-10 Solve the dining philosophers problem using a monitor, see Section 17.4.2 and exercise 17-5.

A number of solutions are given with the solutions to Chapter 17’s exercises as part of a small project exploring this problem.

11-11 Rewrite the solution to the readers and writers problem given in Figure 11.1, using conditional critical regions, to avoid a separate region in which writers write. Hint: the Boolean writelock may be managed within the four critical regions and incorporated into AWAIT statements. Compare your solution with the monitor solution given in Section 11.2.3.

Chapter 12 IPC and System Structure

Exercises

12-1 To what extent can processes which run in separate address spaces make use of kernel-provided semaphores to support their co-operation? To what extent can processes which share an address space or part of an address space make use of this facility?

Processes in separate address spaces may use the WAIT and SIGNAL semaphore operations to co-operate provided they are able to share a naming scheme for the semaphores. The kernel must assist in extending each process’s name space by these shared semaphore names. An example is that the names of semaphores associated with processes are well known. They may then synchronise by means of WAIT and SIGNAL. The kernel assists by blocking and unblocking the processes in the implementation of the operations.

We assume that the processes which share an address space are known to the kernel. Synchronisation may be achieved as described above. If the kernel-provided semaphores may be created by processes, a semaphore may be created to guard a shared data object. The co-operating processes synchronise their accesses to it via WAIT and SIGNAL. Arbitrary data sharing may be achieved by this means.

12-2 How would you expect a user level process to request a service from an operating system in a procedural system and in a non-procedural system? How might the hardware constrain or support the choice of system structure? When do context switches take place in both cases?

In a procedural system a process enters the operating system to carry out the service for itself. There is a protection domain change (effected by a TRAP or similar software interrupt instruction) but not a process switch.

In a non-procedural system a process sends a message to an operating system process to request service. There is a context switch to an OS process.

Hardware points: context switching overhead, software interrupt provision, at least two privileged states, memory protection support, see for example the assumption in the MIPS R2/3000 architecture that the OS occupies half of a 4Gbyte address space (Figure 8.3).

12-3 What is meant by the assertion that procedural and non-procedural systems may be viewed as duals of each other?

Give the duality mapping of: monitor, monitor procedure, condition synchronisation on a condition variable in a monitor, process, monitor procedure call, dynamic process creation.

Outline how the monitors given as examples in Chapter 11, including the exercises, may be rewritten as processes. (Details of how to do this are given in the next chapter.)

Lauer and Needham (1978) "On the Duality of System Structures" answers this question in detail.

A server, invoked by message passing, is considered as the dual of a monitor. Chapter 12 shows that a server may have a number of ports, each associated with an operation that it will carry out on request from a client. At this stage a request message could be assumed to contain the name of the operation the client requires as well as the arguments of the operation.

The most difficult aspect is to mirror condition synchronisation within a monitor procedure. Lauer and Needham assume that the server may define a "port set" dynamically. It will accept a message on any port in the set. If a resource becomes exhausted the port at which it is requested is removed from the set. If a resource becomes available again its port is added to the set. This achieves synchronisation at the level of operations.

The monitor examples can be reworked on this basis or may be done after Chapter 13.

Chapter 13 IPC without Shared Memory

Exercises

13-1 Contrast the UNIX pipe facility for same-machine, cross-address-space IPC with message passing.

The UNIX pipe is designed to make IPC look like file and device I/O. An arbitrary number of bytes may be sent to or received from a pipe at any time. There is no concept of "the end of this communication". Any structuring must be done by the application.

Message systems pass blocks of data from a source to (a) destination specified in the message header. The message transport service does not interpret the message body so, again, the application must impose any internal structure on the message contents. A message is defined as a discrete unit, unlike the stream model of a pipe.

Both pipes and messages allow sender and receiver to synchronise. WAIT for a message or read more bytes from the pipe - if it’s empty you are blocked.

You may be blocked on sending to a "full" pipe (some system determined size). The system may take steps to prevent you from sending too many messages, see Chapter 24.

13-2 Suggest ways in which memory management hardware could be used for passing large messages between processes in separate address spaces. Consider both the case when the message is available to both sender and receiver after the message is sent and the case when the send involves only a transfer from one address space to another. Which semantics do you think should be the default and why?

The pages of the message (which may have been acquired for this purpose before the message was created) are mapped into the receiver’s address space to make the message available to it. The convention may be that the message is first mapped into a system area so that the processes (page tables) are not constrained to be in main memory at the same time.

Ideally, one should be given a choice of share or transfer semantics. One scenario is that sender and receiver may wish to co-operate over processing the message data which has become shared. Another is that the sender should not be allowed to change the message data once it has been sent. Systems differ in their assumptions. See Chapter 22 for examples.

13-3 Suggest two ways in which the handling of message passing between processes may be unified with process-device interactions.

1) Each device becomes equivalent to a process. Device handlers do output by message passing to the device; the system interprets the process name as a device and carries out the I/O. An interrupt causes a message to be sent to the (single) device handler process. A processes model of the world is that all interaction is achieved by message passing.

2) The implementation of message passing may be unified with the event handling mechanisms of the system. Arrival of a message is considered as a type of event.

13-4 What problems do you think would arise if synchronous message passing was used to implement interactions between client processes and heavily used servers. How would you solve these problems?

Clients would be delayed for long periods waiting for service. It is out of the question that servers should wait for clients to synchronise in order to return the results of the service invocation. We must have buffering. As this is not provided by the system it must be implemented at the process level.

13-5 How can language-level type checking be included in a message passing system?

It is straightforward to define a message structure, for example as a record with typed fields. What is required, in addition, is a mechanism for type checking between sender and receiver.

One approach is to use output ports and input ports. A port can be declared as taking only messages of a specified type. This type checking is internal to the process. It remains to make a binding between an output port and an input port at which stage the types of the ports are checked. This could be done statically at system configuration time (as in Conic) or dynamically at run-time by means of a system naming and binding service. Section 15.9 gives an example in the context of an RPC system, the ANSA interface trader.

If the system is not written in a single language the problem of cross-language type-checking arises. This is difficult to solve in general.

13-6 In what ways can the naming of a specific destination process be avoided in a message system design? Why is this desirable?

By use of an indirection such as a channel name or global port name (that is, a port which is not bound to one specific process). This allows a number of anonymous processes to provide an equivalent service or for one process to be substituted for another one, perhaps on a failed component in a distributed system.

13-7 What is meant by multicast? What kind of naming service would be needed to support multicast?

A single SEND causes a multicast message to be sent to a list of recipients. It would be necessary to have a registration service so that names could be associated with groups, effectively distribution lists. A decision would have to be taken on whether lists are to be recursively structured, that is, can we have a component of list which is itself a list and so on. These issues are common at user level, in electronic mail services for example, but might become too heavyweight as part of an IPC mechanism.

A broadcast medium at the LAN level (such as Ethernet) provides a convenient optimisation of multicast. This is not the general case however.

The decision on whether to support multicast relates to application requirements and protocol design.

13-8 Why might it be important to be able to specify a limit on the time that one is prepared to wait for a message?

One might have other tasks to complete by a certain deadline or just alternative work to be getting on with.

There may not be a conditional wait and the timeout may fulfil this purpose. Perhaps there is no message and never will be, for example, one may have to check for an error condition periodically.

13-9 Use the Linda primitives introduced in Section 13.9 to program the interaction between a single centralised file server and its many clients.

The basic structure of a client and a server are:

server: in (S, who:name, in-formals )

[body of server S;

out ( who, out-actuals )

]

client call: out ( S, me, out-actuals ); % send request

in (me, in-formals); % get reply

Gelernter (1985), Carriero and Gelernter (1989), Andrews (1991) and Ben Ari (1990) give a number of Linda examples.

Chapter 14 Crash Resilience and Persistent Data

Exercises

14-1 How do you think an operating system which has bugs and is about to crash might behave differently from the fail-stop model described in Section 14.2?

Why is it unlikely that an operating system that is about to crash will write all over the disks it manages?

Is it more likely that an operating system that is about to crash will write all over main memory? Note that such erroneous writes may be to non-volatile memory that is being used as a cache for data and metadata waiting to be written to disk.

How could it be made less likely that a non-volatile cache would be corrupted in this way?

We are concerned that if we take steps to preserve main memory it might have been corrupted by a failing OS. A failing OS might write at random in main memory and probably has the privilege to write anywhere.

We have seen how disks are programmed: driver code sets up the main memory and disk addresses for a required transfer of one or more blocks. Although a failing OS could cause an isolated error, by corrupting a disk buffer, for example, it is very unlikely to cause many erroneous writes to take place before being stopped.

If we use a non-volatile write cache we should protect it from a failing OS. Needham et al. proposed a protocol in microcode. The idea was to make the OS perform a simple test before writing to NVRAM. An alternative approach is to write-protect NVRAM so that the OS must change the protection of the required page correctly before writing to it.

14-2 Define "idempotent operation" and "atomic operation". Give examples of operations that can and cannot be made idempotent. How can an operation that cannot be made idempotent be made atomic?

An idempotent operation is repeatable. No matter how many times you carry it out the same effect is achieved. An atomic operation is implemented so that either all or none of it is done. We make an atomic operation repeatable by ensuring that, in the event of a crash, for example, when we do not know whether an operation wrote to persistent store or not, we can restore the system state to its value before the operation started. The operation can then safely be repeated.

x := 3 is idempotent and machine-level atomic

x := x+1 is not idempotent and not necessarily atomic, depending on

the machine level implementation of the instruction.

"Write this block of data at offset n" in a data structure in main memory is idempotent but not machine-level atomic. That is, concurrency control is necessary if the operation of writing the block is to be guaranteed to be carried out without interference from concurrent processes.

"Write this byte sequence at byte-offset n" in a file is idempotent. It is atomic if only one disk-block needs to be written but not otherwise.

An operation can be made atomic by recording the system state before the operation so that if it does not complete the system can be returned to its original state.

14-3 How would you support atomic operations in a system which updates persistent data values "in place"; that is, the new values overwrite the old values in the persistent store.

Record each operation in a log with the persistent data values before and after the operation.

14-4 What is meant by a shadow copy of a portion of a filing system?

We have seen that the filing system uses disk pages as the unit of allocation and transfer. Suppose that when a file is changed it is not updated in place. New pages are used and a new version of the metadata is created. When the new metadata overwrites the old on disk (we assume in a single block transfer) the change has been effected and the old pages can be recovered as free pages. Before this happens both the old and new versions of the file are achievable. Note that the old and new versions of the metadata may point to some common pages and some different ones, in which case a page is said to have a shadow page.

Chapter 15 Distributed IPC

Exercises

15-1 Compare and contrast a client server model of distributed computation with an object model.

Could a pipeline be said to adhere to either model? How would you implement a pipelined compiler in a world based on client-server interactions or object invocations?

Chapter 15 starts with a discussion of client-server and object models. A pipeline does not naturally adhere to either model since both assume a request-response flow of control rather than a stream. One can implement each stage of a pipeline as a server with the client as the previous stage. The server-components must be programmed to take a request and acknowledge it immediately in a reply (assuming that a reply is mandatory, as in most RPC systems) so as not to delay the previous component while it carries out processing.

15-2 How can the components of a distributed system agree on a value to use for system time? Under what circumstances might such a value be needed?

The nature of distributed systems is that there is no universal time. The components may use a clock synchronisation protocol to keep their clocks reasonably close together. There may be a requirement to be able to make a decision such as "who asked first". An example is the distributed mutual exclusion protocol outlined in the appendix. There is no correct answer to the question but for the purposes of algorithm design all we have to do is ensure that every component takes the same decision about the order of events. For example. each component uses a timestamp generated from its local clock. There is system-wide agreement on the ordering of the components. No two timestamps from the same component can be the same (by definition). If timestamps deriving from different components are the same they are resolved on the basis of the agreed component ordering.

15-3 A process on node A invokes an operation on node B in a distributed system.

How might the fact that the nodes, and the network connecting them, may fail independently of each other be taken into account in the design of the distributed invocation at the language level and at the communications level?

Can you think of examples where it is desirable that the clocks at nodeA and nodeB should be kept in synchronisation?

At the communications level we use timeouts. A protocol may retry a few times on timeout expiry. On repeated failure a failure return is made to the application.

At the language level we have to be prepared for a failure return.

The language may allow user-defined exceptions and exception handlers.

Suppose clients and servers synchronise their clocks on service invocation and that an RPC contains a timestamp. If the server crashes and restarts it can distinguish between a new, post-crash, call from a client and a repeat of a pre-crash call. See also exercise 15-10.

15-4 For the styles of IPC studied in Part II, explain how the mechanism might be extended for inter-node IPC.

Consider how the network communication software and the local IPC support might be integrated.

Focus on the naming schemes that are used by the centralised IPC mechanisms. Consider the problem of naming when the mechanisms are extended for distributed IPC.

When might the name of a remote operation be bound to a network address. How could this binding be supported at kernel, local service, or dedicated remote service level?

One approach is to detect a non-local destination in IPC and invoke communications services. This might take place within the kernel or through a user-level process or server which stands in place of all non-local destinations. See, for example, the Accent and Mach kernels.

Another approach is to make local IPC a special case of network communication, as in UNIX BSD, see Section 23.16.

A global naming scheme for IPC destinations is required if IPC is to be distributed. Section 15.9 covers some approaches to providing a naming, registration and binding service and gives the method used by ANSA as an example.

15-5 Security involves both authentication and access control (when you invoke a remote operation you should be able to prove that you are who you say you are (authentication) and there should be a check that you are authorised to invoke that operation on that object (access control)). Consider how both of these functions are supported in a centralised, timesharing system and in a single user workstation. What infrastructure would be needed to support these functions in a distributed system?

In a centralised system you typically authenticate yourself by password when you login to the system. This is potentially insecure if your password is transmitted as plaintext (without encryption) from your terminal to the host. Also, if you are using a terminal emulator rather than a dumb terminal, someone may have inserted a password grabbing program.

The method used for authorisation (access control) depends on the type of object you wish to use. Access control lists associated with an object indicate WHO may use it and in what way (and you have proved who you are in a once-for-all authentication at login). Certain objects, such as memory segments, may be created dynamically and associated with a process you have created to run a program. Access control for these is set up in the memory management hardware.

A secure distributed system would have an authentication service with encryption key management and encrypted communication would be used. A full discussion is beyond the scope of this book but a point to emphasise is that most distributed systems are not highly secure. In the context of RPC a danger is that an erroneous or malicious client could waste server time, even if the requests for service were rejected.

Students could be asked to design a registration and "login" service for users of a small localised distributed system.

15-6 How can compile time type checking be supported in an RPC system?

How can an RPC system check that you are not attempting to call a remote procedure that has been changed and recompiled since your program was compiled and type checked against it?

Interface specifications of remote procedures must be available for compile time type checking. This might be through shared libraries of public service interfaces and those defined by components of a distributed application.

One method is to associate the identifier of the remote interface with the call expression on compilation, see 15.7.2 and to pass this identifier with each call at run-time so that it may be checked.

The approach taken by ANSA is given in 15.9.3. Here, the provider of a service exports its interface to a system server (interface trader) and is returned an identifier. Clients import the interface and are given the same identifier which may be checked on an RPC. The provider may later withdraw or change the interface and the client’s identifier will become out of date.

15-7 Can RPC be used in a heterogeneous environment?

An aim of standards organisations is to make interworking of heterogeneous systems possible. There are standard protocols for remote execution, for example, the ISO REX protocol, and standards for external data representation, for example, ASN.1 (ISO, 1986).

15-8 How could you contrive to program an application in which massive amounts of data are to be transferred according to a "stream" paradigm in a system which supports only RPC for distributed IPC. You may assume that multi-threaded processes and dynamic thread creation are available.

A naive use of an RPC scheme is that a block of data would be sent and acknowledged, then another block and so on. A disadvantage is that the sender may not want to wait for an acknowledgement before sending the next block in sequence. This disadvantage is minimised if very large amounts of data are sent in an RPC. The problem then is that the receiver may be able to start work as soon as a small amount of data arrives and large packets occupy too much buffer space.

If the RPC system allows many threads of a process to use the service at the same time, a number of threads could be used to send a sequence of data blocks. As soon as an acknowledgement arrives for a thread, that thread could send off the next block of data.

15-9 Is distribution transparency fully achievable in an RPC system? What software components can be used to provide this illusion? How could the independent failure modes of system components be handled in an RPC system where distribution transparency was a design aim?

Suppose that at the language level the program is written without distinguishing between local and remote procedure calls. A pre-processor could detect unknown local procedures and, if all is well, insert calls to library routines to invoke remote procedures. A certain amount of exception handling could be done at this level and a call retried, for example, but if a failure has occurred the main program cannot continue. That is, in the absence of failures the scheme as presented will work. If there are failures the program has to deal with them and transparency is violated.

As discussed in 15.7.1 certain arguments do not make sense for a remote call. An RPC design aiming for transparency has to decide what to do about reference parameters and so on.

15-10 Is it desirable that an RPC system should provide support for client, network and server crash and restart? Distinguish between "exactly once" semantics in the absence of crashes and restarts and in their presence. What are the implications on the application level of providing universal "exactly once" semantics? What are the likely implications on the performance of all RPC’s?

In general, it is not desirable to attempt to handle crash and restart transparently. One would be setting up a transaction system, supporting atomic RPCs, which is not appropriate at this level.

Exactly once semantics in the absence of crashes indicates that the RPC communications level will retry on timeout to attempt to succeed in making the call in spite of server load or network congestion. Eventually failure is reported to the application. If the server has crashed, this might have been part way through a call. Exactly once semantics in the presence of crashes would indicate that the server, on restarting, should return its state to that before such a call started. This would require the server to support atomic operations by a method such as those described in Chapter 14. It would also require the RPC system to be able to store persistently the calls in progress at all times in case of crashes.

PART III CONCURRENT COMPOSITE ACTIONS

Chapter 16 Decomposable Abstract Operations

Exercises

16-1 What problems might occur if the sub-operations of a single high level operation are started off in parallel (for example, for running on a shared memory multiprocessor)? Give examples. Why is it difficult to automate this approach?

The semantics of the high level composite operation may be such that an ordering is required on the sub-operations. It is difficult to automate this approach to achieving concurrency because the orderings depend on the application. For example, it may be necessary to read several values, perform a computation and write a value which is subsequently used by other sub-operations. A simple example is transfer requiring check-balance and debit to be done before credit.

16-2 Find out about the consistency checks your local operating system runs on its filing system on restarting after a crash or shutdown (an example is UNIX’s fsck maintenance tool). Is it possible for users to lose data? If so, how are they informed of this possibility? Why is this approach not possible in a transaction processing system?

This is system dependent. It is common for filing systems to scavenge and tell you when they have had to make an attempt to recover a consistent version of a file which may not reflect all of the work done when the crash occurred. You may have had successful returns from system calls which write to a file but still lose this data on a crash.

Transaction systems must guarantee that when the client is told that a transaction is done (committed) the result will persist in spite of any subsequent crashes or media failures. The system updates the persistent store then acknowledges commit to the client. A crash during a transaction will automatically imply abort. Because each operation is atomic, values can be restored to the state prior to the transaction whatever had happened when the crash occurred.

The requirements of transaction systems make file-system-style consistency checks inappropriate and the implementation of transaction systems makes them unnecessary.

16-3 To what extent is it desirable to run the sub-operations of different high level operations in parallel? What are the possible problems arising from uncontrolled concurrency? (Give examples of incorrect output and incorrect system state). What are the possible consequences of attempting to solve these problems?

This question reinforces Section 16.5 where examples are given. Uncontrolled concurrency can lead to incorrect output, for example, inconsistent, (uncommitted) system state may be seen, and incorrect state, for example because of lost updates.

If we allow interleaving in general but attempt to solve the problems by allowing objects to be locked by transactions, deadlock might arise. Other approaches to solving the problems might lead to many aborted transactions and complex, inefficient, procedures for recovering system state in order to effect these aborts.

16-4 What are the possible effects of a crash part-way through a single composite operation? What mechanisms must be in place to allow for the possibility of a crash at any time in a transaction processing (TP) system? Are these mechanisms suitable for handling concurrent execution of composite operations as well?

The point to emphasise here is that each component operation is implemented atomically and some may have completed when the crash occurs (in contrast with the single atomic operation of Chapter 13). If the partially complete high level operation is to be undone then all its component operations must be undone, even those that have finished.

If we have to be in a position to undo any operation in order to recover from a crash (either by restoring prior state or by applying an inverse operation) we can perhaps use the same mechanism to undo operations when concurrent execution has caused incorrect values to be recorded or used. This will have to be detected.

16-5 Why is it important, when designing a TP system, to have estimates of:

the likely level of load on the system;

the probability that requests for simultaneous access to the same data items will occur.

In order to choose an appropriate method of concurrency control. Can we use an optimistic method, taking the risk that a conflict will not occur? Or should we be pessimistic and assume that conflicts are likely? If the load is very light we can use a simple but crude method based on locking objects for long periods.

In some applications the data will have hot spots which are accessed frequently, even though conflict over most of the data may be unlikely.

16-6 Consider how you would set up an object model for a TP system. What are possible disadvantages of treating read and write as operations on data objects?

This is arguing for an O-O approach to database design. The approach works well through Part III for reasoning about the behaviour of systems. The definition of conflicting operations is more meaningful in terms of the operations on the object rather than just reading and writing its value. It is useful to refer back to the atomic operations in main memory discussed in Part II. Reading and writing locations in main memory are the components of a higher level operation such as "add record to linked list".

When we come to recovery in Chapter 20 we require operations to be undo-able and redo-able and that undo and redo are idempotent in case of a crash during recovery. In that chapter we move to an implementation of recovery based on recording object state.

Chapter 17 Resource Allocation and Deadlock

Exercises

17-1 Consider a wide range of examples of objects or resources that are allocated by a system to meet the demands of the application or applications that it runs. Consider the extent to which these demands are predictable in advance. Consider real-time systems, multi-user (distributed) operating systems, database management systems etc.

This chapter focuses on algorithms that can be used to solve general resource allocation problems. There are often simpler solutions to specific concurrent programming problems, such as the dining philosophers problem. Other large concurrent programs, with many locks, semaphores or monitors, for example, will require a more general approach.

The subject is often discussed for the resources allocated dynamically by operating systems such as certain devices, disk space, memory and so on. In batch systems a "job" had to specify its resource requirements as part of the job control information. In interactive systems information is passed from compiler to linking loader to OS to indicate some of the resource requirements of a program that is to run, for example the amount of memory it will need.

In database systems the resources that are allocated dynamically are the data objects stored by the system.

17-2 What are deadlock, livelock and starvation? Give examples.

Deadlock implies that a blocked cycle of processes exists. Each holds at least one resource and is requesting another resource which is held by another process belonging to the cycle. Chapter 17 gives some examples.

Livelock implies that a process is not blocked but that it will never progress to its next stage. An example is busy waiting on a flag that no other process will ever change. Another is an incorrect entry protocol for a critical region where the processes execute the protocol indefinitely but none enters the region.

Starvation implies that a process is being unfairly treated with respect to other processes. A process may be blocked waiting for a resource that, for some reason, is always given to some other process.

17-3 What are the four conditions that must hold for deadlock to exist in an object allocation system? Which of these are system policies? Which is concerned with a specific sequence of events that happens to have occurred?

Is it possible to ensure that any of these conditions can be guaranteed not to hold?

This is reinforcing the material presented in 17.4. It is not always possible, depending on the application. For some applications one can ensure that one of the conditions does not hold.

17-4 Consider the following alternative specifications of the resource requirements of a process. In each case the process makes requests dynamically for the objects it needs.

a) No resource requirements are specified in advance.

b) The total resource requirements are specified.

c) The order in which the resources are required is specified.

d) The order in which resources are acquired and released is specified.

Discuss how these levels of information on resource requirements could be used to handle the possibility of deadlock. What are the tradeoffs involved in deciding on the amount of information to hold and the amount of processing to be done on it.

a) We can only detect deadlock.

b) We can avoid deadlock.

c) If this is a system-defined order so that all processes acquire their resources in the same order we have prevented deadlock by making a cycle impossible.

d) First assume a static number of processes such as in a process control system. It is possible to define a total schedule for the processes comprising steps or phases such that a process moves to a new phase when it releases a resource or requires a resource. A system state comprises a phase of each process. An algorithm is desired to take the system from its starting state to its finishing state (or the end of a period after which it restarts) without entering an unsafe state.

17-5 Devise the following solutions to the dining philosophers problem of Section 17.4:

a) Take the semaphore program given in Section 17.4 as a starting point.

Explore the use of an additional semaphore to achieve mutual exclusion, either to ensure that both forks are picked up at once or to simulate a room which the philosophers enter one-at-a-time to eat. Adapt this latter solution to allow four philosophers at once to enter the room.

b) Write semaphore programs which break the symmetry. Let odd-numbered philosophers pick up their forks left then right and even numbered philosophers right then left. An alternative is that just one philosopher picks up his forks in the opposite order.

c) Write the monitor "solution" that is equivalent to our first attempt, given in Section 17.4, that is susceptible to deadlock.

d) Write a monitor solution that simulates a room in which the philosophers eat one-at-a-time.

e) Write a monitor solution that allocates forks so that only four philosophers at once may be attempting to eat.

f) Write a monitor solution such that philosophers request both forks at once.

g) The above solutions are specific to this problem. Explore the use of the general approaches to deadlock detection and avoidance described in this chapter. For example, a fork-allocation monitor could run a deadlock detection algorithm whenever it could not satisfy a request for a fork. Or each process might register a claim for the total resources it might ever require before starting to run its algorithm. A fork allocation monitor might then run a deadlock avoidance algorithm. Note the length of these solutions.

The problem is a classic in the literature on concurrent programming. Although it is intended primarily to point out the dangers of writing symmetric solutions for concurrent processes (problems may arise if each process i executes the same program) we shall also use it to illustrate the general approach to deadlock detection and avoidance given in Chapter 17. We first give some solutions to the specific problem using semaphores and monitors.

The following variables are shared by all processes:

varfork: array (0..4) of semaphore % initialised to (1, 1, 1, 1, 1)

We use @+ to indicate addition modulo 5, where the processes have id’s 0..4.

A first attempt at a solution is for each process (identifier i) to execute the following program:

repeat

think

WAIT (fork(i));

WAIT (fork(i@+1));

eat

SIGNAL (fork(i));

SIGNAL (fork(i@+1))

until false;

The problem is that the processes might run concurrently in strict synchronisation so that each succeeds in picking up its left fork then requests its right fork. We then have deadlock.

A solution is to use an additional semaphore to achieve exclusive execution of the two WAIT operations:

guard: semaphore := 1

repeat

think

WAIT (guard); % put a critical region round

WAIT (fork(i)); % the two WAIT (fork) operations

WAIT (fork(i@+1));

SIGNAL (guard);

eat

SIGNAL (fork(i));

SIGNAL (fork(i @+ 1))

until false;

We can invent stories about the philosophers entering the room with the table one at a time in order to eat, in which case we would put SIGNAL (guard) after the two SIGNAL(fork) operations in the above solution instead of before eat, so that the philosophers eat in the room, put down their forks and leave it. Carriero and Gelernter (1989) give this solution in terms of the Linda primitives (Section 12.9.2).

Although this solution avoids the problem of deadlock it is unnecessarily conservative in restricting concurrency. We could allow some combinations of philosophers to eat together, for example, 0 can eat with 2 or 3 but not with 1 or 4. If we allow four processes to enter the room at once, any one might be delayed, waiting for a fork, but would eventually be able to eat:

count: semaphore := 4

repeat forever

think

WAIT (count); % only 4 at once can pass this point

WAIT (fork(i));

WAIT (fork(i@+1));

eat

SIGNAL (fork(i));

SIGNAL (fork(i@+1));

SIGNAL (count); until false;

If we break the symmetry of the processes’ programs we can avoid the possibility of a cycle of deadlocked processes. We can do this by making one process acquire its forks in the opposite order to the others, or we can make processes with odd identifiers pick up the forks in one order and even ones in the other order:

if odd(i) then....else.....;

The semaphore WAIT operation does not allow us to explore solutions where we release the first fork if the second is already in use.

Let us now move on to monitor solutions to the problem. Going back to our first attempt, a similar monitor "solution" is:

fork-allocator: monitor

var fork:array (0..4) of integer range 0..1 % initialised to (1, 1, 1, 1, 1)

fork-free: array (0..4) of condition; procedure acquire-fork (i)

if fork(i)=0 then WAIT (fork-free (i)); fork(i) :=0;

end acquire-fork (i);

procedure release-fork (i)

fork(i) :=1;

SIGNAL (fork-free (i))

end acquire-fork (i); end fork-allocator;

And each process (i) executes:

repeat

think

fork-allocator. acquire-fork(i);

fork-allocator. acquire-fork(i@+1);

eat

fork-allocator. release-fork(i);

fork-allocator. release-fork(i@+1) until false;

Our solution which allows only one process at once into the room can be rewritten with a monitor representing the room:

room-1: monitor

var fork:array (0..4) of integer range 0..1 % initialised to (1, 1, 1, 1, 1)

procedure eat (i)

fork(i):=0; fork(i@+1):=0; eat; fork(i) :=1; fork(i@+1):=1;

end eat (i) end room-1;

And a philosopher executes:

repeat

think; room-1.eat(i);

until false;

The monitor shows the degree to which concurrency is restricted more clearly than the semaphore solution. The possibility of contention for resources is removed.

If we allow four at a time to eat we cannot put eat inside a monitor. We have to make our story include a means for philosophers to acquire permission to start eating.

eat-4: monitor var fork:array (0..4) of integer range 0..1 % initialised to (1, 1, 1, 1, 1)

fork-free: array (0..4) of condition; eat: condition; count: integer :=4;

procedure start-eat(i)

count:=count-1; if count=0 then WAIT (eat); % are four already attempting to eat? if fork(i)=0 then WAIT (fork-free (i)); fork(i) :=0; if fork(i@+1)=0 then WAIT (fork-free (i@+1)); fork(i@+1) :=0;

end acquire-fork (i);

procedure end-eat(i)

fork(i) :=1; fork (i@+1)=1

SIGNAL (fork-free (i)); SIGNAL (fork-free (i@+1)); count := count+1; SIGNAL (eat) end end-eat(i);

end eat-4;

And each process (i) executes:

repeat

think

eat-4.start-eat(i);

eat

eat-4.end-eat(i)

until false;

Other monitor solutions are possible. Students may be asked to devise a monitor solution which causes a process to free its left fork if its right fork is busy and one which allows a process to acquire both forks in one call to a monitor procedure acquire-forks (i).

Notice that the monitors which avoid the possibility of deadlock are written to solve this specific problem. In the monitor fork-allocation, where deadlock is possible, there is no knowledge of the resource requirements of the processes in the monitor. It attempts to satisfy each request that arrives and blocks a process if its resource request cannot be satisfied.

fork-allocation could be extended to run a deadlock detection algorithm when it cannot satisfy a resource request. The array fork can be expanded into an allocation matrix and a request matrix request (0..4,0..4) must also be maintained. It is therefore necessary for the process id as well as the required fork id to be passed as an argument to the procedure acquire-fork. The pseudo-code below gives the general idea. The code is becoming long because we are taking the general approach of Section 17.6 to solving the problem.

procedure acquire-fork (j,i) % process j wants fork i;

if fork(i)=1 then fork(i) :=0

else begin request (j,i) :=1;

deadlock:boolean := deadlock-detect();

if deadlock then "raise exception"

else begin WAIT (fork-free (i));

fork(i):=0 end end end acquire-fork (i);

We can extend this general approach by making information available within the monitor on the total resource requirements of each process. This might be done statically or, more generally, to require a process to register its total requirements at the monitor before claiming any resource. The general deadlock prevention algorithm could then be run before any request was granted. Again, the program is becoming very long. For example, when processes 0,1,2 and 3 have each acquired one fork and process 4 requests a fork we have:

process allocated forks requested forks total required

0 (1, 0, 0, 0, 0) (0, 0, 0, 0, 0) (1, 1, 0, 0, 0)

1 (0, 1, 0, 0, 0) (0, 0, 0, 0, 0) (0, 1, 1, 0, 0)

2 (0, 0, 1, 0, 0) (0, 0, 0, 0, 0) (0, 0, 1, 1, 0)

3 (0, 0, 0, 1, 0) (0, 0, 0, 0, 0) (0, 0, 0, 1, 1)

4 (0, 0, 0, 0, 0) (0, 0, 0, 0, 1) (1, 0, 0, 0, 1)

available forks (0, 0, 0, 0, 1)

The deadlock avoidance algorithm of Section 17.7 could be run on this data and would show that it is not safe to grant this request.

This example shows the students the general algorithms discussed in Chapter 17 in the context of a simple, specific example.

Chapter 18 Transactions

Exercises

18-1 How can a serial execution of a composite operation be guaranteed? What is a serialisable execution of a composite operation?

A serial execution can be achieved by locking all the objects required by a composite operation before it starts. A serialisable execution of the component operations of a number of composite operations is equivalent (with respect to output and final system state) to some serial execution of those composite operations.

18-2 In a TP system a client submits a transaction, it is done and acknowledged to the client. What must the system guarantee when that acknowledgement is given?

That the result persists irrespective of subsequent system crashes or media failures.

18-3 What are the ACID properties of atomic transactions and how can they be ensured under concurrency and crashes?

See Section 18.4 for a definition of A. C. I. D. It is useful to point out that A and D are the concern of recovery procedures and C and I are the concern of concurrency control.

A is ensured by the ability to recover state.

C is ensured by serialisable executions.

I is ensured by enforcing strict execution schedules in strict 2PL and strict TSO and by OCC when invocations take place in isolation on shadow objects. It should be mentioned that we shall see that I is not necessarily enforced in the implementation; that is, non-strict executions may be allowed. This would increase concurrency at the risk of having to abort transactions.

D is ensured by writing the results of a transaction to persistent store, with whatever degree of redundancy is deemed necessary for the requirements of the application, before commit is acknowledged.

18-4 Relate the system model based on object invocation given in Section 18.6, to the discussion of Section 14.9 on object naming, location and invocation.

This is discussed in the solution to exercise 20-3.

18-5 How does the graph which represents the history of a set of transactions being executed differ from a serialisation graph? What property of the serialisation graph must hold for the transactions that it represents to be serialisable?

An execution history shows all the component operations of the transactions and the order in which conflicting operations take place.

A serialisation graph shows only transactions. A directed arc from one transaction to another in a serialisation graph is drawn to indicate the order of execution of conflicting operations. The serialisation graph must be acyclic for the transactions represented in it to be serialisable.

18-6 Why might the decision to abort one transaction lead to the need to abort another?

If a transaction has seen state that is subsequently undone on the abort of another transaction then it must also be aborted.

Could this happen if the property of isolation of atomic transactions was enforced at all times? No.

18-7 Give some practical examples of conflicting operations on an object.

Our definition of conflict starts from some system state. The final state on executing two conflicting operations differs depending on the order of execution. It is better to focus on state than on output.

Simple examples can be taken from arithmetic.

(a+b)*c (add b then multiply by c) is not the same as a*c+b (multiply by c then add b).

"Calculate the interest on a bank account" conflicts with deposit. Performing any computation on an object’s state conflicts with changing the state.

"Calculate the sum of a number of object values" conflicts with changing any of the values.

"Change the value of an object to X" conflicts with "Change the value of an object to Y". This can be applied to writing new versions of parts of documents, changing part of a chip layout in a CAD system and so on.

18-8 Assume that every operation on an object has an inverse or undo operation. Assume that a number of operations have taken place on an object. When can an undo operation simply be applied to the current state of an object, even if there have been operations on the object since the one that must be undone?

If all the operations subsequent to the one which must be undone do not conflict with it then it may be applied to the current object state.

Chapter 19 Concurrency Control

Exercises

19-1 Why is the provision of lock and unlock operations not sufficient to ensure serialisability of composite operation invocations?

It depends how they are used. Lock, do operation, unlock just makes each component operation atomic (which we assume anyway). There must be an additional policy on the use of locks to ensure that an object is not unlocked "too early".

Why does two-phase locking ensure serialisability?

Once a transaction has locked an object it does not unlock it until it has locked all the objects required by all its operations. This ensures that two transactions must access any objects they have in common in the same order.

Why is two phase locking subject to deadlock? (Consider the four conditions for deadlock to exist, given in Section 19.4.)

Two transactions may have objects in common. One may lock some of the common objects. The other may lock others. When the transactions request the objects held by the other we have deadlock. The four conditions hold.

Why does two phase locking not guard against cascading aborts?

Suppose a transaction locks all its objects then proceeds to invoke them. If it unlocks each object as soon as its invocation is done then that object’s state may be seen by other transactions before this one completes. If this one aborts we must abort all such transactions.

In what way does strict two phase locking guard against cascading aborts?

In strict 2PL the transaction does not unlock its objects until it completes with commit or abort. This ensures isolation.

19-2 Why might the start-time of a transaction not be the best time to use for its timestamp in TSO?

The transaction may spend some time processing before it starts to invoke persistent objects, or it may invoke operations which cannot conflict before potentially conflicting ones. It is then likely to be deemed "too late" at the objects it invokes ( if transactions with later timestamps have now invoked the potentially conflicting operations). (Contrast with OCC where all transactions prepare the state they wish to commit in shadow objects then request commit. In this case the order is first to commit, first served.)

Given the timestamps of two committed transactions can you always draw their serialisation graphs? Does timestamp ordering restrict concurrency more than locking? Discuss. Compare the overhead of implementing locking with that of timestamp ordering.

Yes, the directed arc is from the transaction with the earlier timestamp to that with the later one.

We are concerned only with operations that are potentially conflicting. Let us assume, for the purpose of this comparison, that we do not enforce isolation by a strict execution schedule. (Strictness is for avoiding cascading aborts.)

In a system which employs 2PL for concurrency control, a transaction must wait if an object it requires is locked. We avoid non-serialisable schedules by allowing a number of objects to be held locked. Deadlock may arise and deadlock detection must be carried out. In a system which employs TSO the decision on whether an invocation can go ahead is made immediately and at a single object. This makes a higher degree of concurrency possible. An object is available except when being invoked, yet we have avoided non-serialisable histories and the possibility of deadlock. However, some transaction may have to be restarted with a later timestamp and the response time to that application may be no better than in the locking system. We allow equivalence to only one serial order of execution, that of the timestamps of the transactions.

2PL requires centralised information on the locks held and requested so that deadlock detection can be carried out. In TSO each object decides whether an invocation can go ahead. Overhead in TSO comes from having to abort and restart transactions.

19-3 Why are cascading aborts possible in a system with timestamp-based concurrency control? What extra mechanism could prevent it?

An object is available except when being invoked. A transaction may go on to abort after invoking an object and another transaction may have seen its state in the meantime. We can prevent this if an object only considers accepting an invocation if the transaction which carried out the previous one commits.

19-4 Is concurrency control based on timestamp ordering, or strict timestamp ordering, subject to deadlock?

In TSO no objects are held so deadlock cannot happen. In strict TSO an invocation can be rejected while an object awaits the commitment of the transaction which last invoked it. The method is defined so that cycles are avoided so deadlock cannot happen.

19-5 Why is optimistic concurrency control (OCC) potentially appropriate for use in real-time systems?

A transaction does not have to wait for an object to become available. The only possibility of delay in taking a shadow is if the current state is being updated. Objects are not locked for invocation or for atomic commitment.

Why is it potentially good for systems where contention is rare?

The overhead is associated with restarting a transaction with new shadow objects if conflict is detected on validation.

19-6 What is involved in aborting a transaction in a system which uses OCC?

Restarting it with new shadow objects.

Describe the validation and commitment phases of an OCC scheme. Consider the case where the objects to be committed have had updates committed since the shadow copies were taken. Consider the cases where the updates are the result of operations which do and do not belong to conflicting pairs. What actions should be taken in both these cases?

First, a check is made that the object states represent a consistent system state (objects are not locked for atomic commitment in OCC). If this is OK:

Suppose that changes have been committed at an object invoked by a transaction since its shadow was taken. If these result from operations which do not conflict with that of our transaction, its operation can be applied to the current system state. The validation phase must ensure this over all the objects invoked by a transaction. If conflicting operations have been committed at any object then the transaction is aborted and restarted.

A given object must participate in only one validation at once and must apply updates in the order specified by the validator.

19-7 Suppose that two transactions use copies of the same objects under an OCC scheme. Suppose that both transactions generate output and both are committed. Does the output reflect a serial ordering of the transactions? Does it matter?

The fact that they are committed means that they operated on a consistent system state. The output does not reflect a serial ordering and it does not matter - it is the nature of OCC that several transactions can operate on the same (consistent) system state.

19-8 Consider the example shown in Figure 19.11.

What happens when T3 requests commit?

mult(2) conflicts with add(3). T3 is restarted with a shadow object A=4.

What happens when T1 then requests commit?

add(2) does not conflict with add(3). T1 is committed and its operation is applied to the persistent object A=6.

If T3 requests commit it will be rejected again and restarted with A=6. It may then go on to request commit giving A=12.

19-9 A particular application is known to comprise almost entirely read-only transactions. Discuss the three approaches to concurrency control for such a system.

If we use 2PL we would achieve better performance if the system supports shared read locks. If a transaction requests to convert a read lock into a write lock it may be delayed until other transactions have finished reading. Deadlock can occur over lock conversion. In TSO most invocations will go ahead as read does not conflict with read. In OCC most transactions will succeed in committing as object states are not changed by read-only transactions (read operations commute). Presumably we are using a transaction system because updates are made from time to time, even if they are infrequent.

19-10 For a particular application, transactions are either read-only or have a phase in which they read and compute followed by a phase in which they write their results back to the database. Discuss the three approaches to concurrency control for this application.

The question is under-specified. It does not give us enough information about the type of application area. It may be that the transactions are long, such as in CAD applications. Does the read and compute followed by write comprise a single operation?

Strict 2PL ensures that transactions see only committed system state. It is desirable that read-only transactions should be able to take out shared read locks on objects. Read-write transactions would require to be able to convert a read lock into a write lock; that is, acquire a write lock without releasing the read lock on an object. Deadlock can then arise.

TSO would allow read operations to go ahead since they do not conflict.

The question does not say whether conflict is expected to be rare. If this is the case then OCC is suitable for this style of use. Read-only transactions may take shadows without delay. A check is carried out when the transaction requests commit that the shadows are consistent. If conflict is rare, this is likely to be the case.

19-11 In OCC we defined the version number of an object as the timestamp of the transaction whose validated invocations were most recently applied to the object. The transaction’s timestamp is therefore that of its commit time instead of its start time. Contrast TSO and OCC with respect to the timestamps allocated to transactions and the schedules that are allowed to execute.

In OCC and TSO a timestamp is associated with each recorded state (or version) of an object. In OCC the timestamp is issued by the validator of the transaction which invoked the object when it guarantees that the transaction will commit. Updates must be applied at objects in timestamp order. Shadow objects may be taken at any time. Updates made at an object are those of committed transactions. Transaction abort does not affect the persistent object; shadow objects are discarded.

In TSO a timestamp is associated with a transaction, possibly its start time or when it first invokes a potentially conflicting operation. Each object makes an individual decision on whether an invocation can be carried out on the basis of the transaction timestamp. The transaction co-ordinator (scheduler) will abort the transaction if any invocation is rejected at the object. There is therefore the possibility that invocations must be undone.

Strict TSO keeps the object unavailable until the decision to commit or abort the transaction has been taken. The updates can be applied at all objects on commit with the timestamp of the transaction as version-id.

Chapter 20 Recovery

Exercises

20-1 Consider a TP system crash, in which main memory is lost at the following times:

a) A transaction has completed some but not all of its operations.

b) A client has received acknowledgement that a transaction has committed.

c) The system has decided to commit a transaction and has recorded the fact on stable storage but there is a crash before the client can receive the acknowledgement of commit.

Discuss what the system must do on restart for each of these cases. Consider for each case where the data values that have been or are about to be committed might be stored. Why is it essential for the TP system to know? Why might a conventional operating system not make this information available?

a) undo all the invocations of the transaction. Note that the transaction id and the operations that have been carried out must be recorded in persistent store if any of the results have reached persistent store. If all information is in main memory its loss would make the situation as though the transaction had not taken place.

b) the result must persist whatever happens. Information must be recorded with redundancy in persistent store on commit.

c) If the decision point is passed then the transaction is committed. Information must be in persistent store before the decision to commit can be taken. We hope that recovery is fast enough for the client to be informed. The seat is booked for her but she hasn’t been told.

20-2 A recovery log is very large and it may be used for transaction abort as well as crash recovery. How can a periodic checkpoint procedure help to manage this complex process?

Checkpoints add structure to the log. After a checkpoint is taken we know that object updates and log records for the transactions noted in the checkpoint have reached persistent store. Crash recovery can start from the most recent checkpoint.

20-3 Why must undo and redo operations be idempotent?

A crash may take place while recovery is in progress. We do not know on restart whether or not we have undone or redone an operation.

20-4 Consider a TP system based on the bank account objects used as an example in Section 20.6 and shown in Figure 20.4.

For each operation define an undo operation.

credit: debit and vice versa

add-interest-to-balance: the inverse computation can be carried out

set-interest-rate: this is an overwrite and can only be undone if pre- and post-state is recorded.

Suppose that a recovery log record contains:

Transaction id, object id, operation, arguments, state prior to invocation.

Define recovery procedures for transactions in all the states shown in Figure 20.2 when a crash occurs. Consider the possibility of a crash during the recovery procedures.

The recovery algorithm given in Section 20.5 can be followed. The undo and redo operations must be made idempotent to allow for a crash during recovery. With information recorded as stated above an undo requires only that we set an object’s state to its value before the invocation. We have sufficient information to redo an operation.

20.5 Redesign the recovery algorithm of Section 20.8 for a non-strict execution schedule (where cascading aborts must be allowed for).

Chapter 21 Distributed Transactions

Exercises

21-1 In what ways do distributed systems differ from centralised ones?

Revision of Section 5.5: no global time; independent failure modes; no quiescent, therefore consistent, system state.

21-2 How can the components of a distributed system agree on a basis for establishing system time? Under what circumstances might system time be needed?

Revision again, see Chapter 5.

21-3 Relate the object model in Section 21.3 and the distributed TPS of Section 21.5 with the discussion on the implementation of naming, binding, locating and invoking objects of Section 15.9.

First, consider the component TPSs as a distributed application that a client may wish to invoke. The components may be named as a set of application servers. The objects they manage are named internally by the TPS. This is the client-server model.

A more general object model might support global naming and invocation of the data objects. How might this be managed in a system?

We have persistent objects managed by an application. We assume a naming scheme for the objects within the TP system. The TP system might use a private naming scheme and a conventional file service which happens to be distributed.

If the underlying system is based on a client-server model it will provide facilities for naming and locating services. The object names are likely to have a component which identifies the server which created them. An invocation can then be directed to that server.

If the TP system is built in an object oriented world the (persistent) objects it manages might be made known to the underlying system and named and located through system policies and mechanisms.

In general, we assume that an object may be invoked from a TP component at any node of a distributed system. There must be underlying mechanisms to locate the object, given its name. The issue of naming and location is the concern of distributed systems designers.

This question raises issues that lead on to further study in distributed systems.

21-4 Describe the operation of a distributed TPS from the point at which a client submits a transaction to a single component of the TPS.

The transaction is managed from the node at which it is submitted, the "home node". Object invocations are initiated from the home node, for example by means of an RPC for each invocation. The details depend on the method that is used for concurrency control. Invocations may be made at each object, independent of other objects at this stage. Later, commitment will require co-ordination among the objects involved in the transaction.

When the transaction is ready to commit, the home node is responsible for ensuring that commitment is atomic over all the objects involved. If a pessimistic method for concurrency control is used this involves acting as the commitment manager for an atomic commitment protocol, such as two-phase or three-phase commit. The objects are unavailable to other transactions while this is in progress. Note that, in a distributed system, allowance must be made for failures of system components during commitment.

If OCC is used the home node is responsible for managing the validation and commit phase of the transaction. Again, this requires co-ordination among the distributed objects involved in the transaction. Although an object participates in only one validation protocol at a time, it is available to other transactions for other purposes.

21-5 Why are the timestamp ordering (TSO) and optimistic concurrency control (OCC) approaches to concurrency control potentially more suitable for distributed implementation than two-phase locking? How can 2PL be simplified?

Each object decides independently whether an invocation can be carried out (TSO) or committed at that object (OCC). 2PL requires distributed deadlock detection. A simpler approach is that a transaction waits for some specified time to acquire a lock on an object. After this it aborts and frees any other objects it has locked.

21-6 What is involved in the validation phase of OCC in a distributed system?

First, we have to establish that a consistent set of shadows were used by the transaction. This is done on the basis of the timestamps of the shadow objects. See the solutions to the additional exercises of Chapter 21 for an example.

If all is well, we need to check with each object whether any other transaction has committed a conflicting update to it since the shadow was taken by this transaction. If this is the case for any object involved then the transaction is aborted, otherwise commitment is guaranteed. This involves associating a timestamp with the transaction’s updates and sending them to each object involved. Objects must make updates in a globally consistent order.

21-7 Why is a complex protocol required to commit a transaction atomically in a distributed TPS?

An agreement must be reached among distributed objects. We must ensure that all commit or none do. The complexity arises because there are communication delays in a distributed system and any component may fail at any time. Also, an object may be involved in more than one transaction. We may fail to acquire the co-operation of an object in an atomic commitment protocol because it is participating in that for some other transaction. (We may have prevented this possibility by using strict 2PL or strict TSO for concurrency control).

What happens in the two-phase commit protocol if the transaction manager fails - discuss its failure at all relevant stages in the protocol.

The main point is that there is a single point of decision - when the decision to commit or abort is recorded persistently. On restart, the recovery procedures must ascertain the transactions for which commitment was in progress when the crash occurred and the decision reached, if any.

Suppose a participant fails after voting for commit of a transaction. What should it do on restart?

Find out the decision from the commitment manager and act accordingly.

What are the advantages and disadvantages of letting the nodes participating in a two-phase commit know about each other?

Suppose a node crashes and restarts. It needs to find out the decision on commit or abort. It may not be able to contact the commit manager. It may instead contact any other participant.

The disadvantage is extra requests that a given participating node has to handle. The list of participants can be included with normal messages in the protocol and this does not add significantly to the overhead.

Chapter 22 Distributed computations

Exercises

22.1 How is the distributed deadlock detection algorithm of Section 17.10 affected by failure of the nodes involved?

22.2 Expand the algorithms and protocols for cache coherency outlined in Section 7.6.5. Discuss their failure semantics.

22.3 Discuss object coherency and failure semantics for a distributed object mapping system as described in Section 7.7.2.

22.4 Discuss how the nodes executing an atomic commitment protocol after write-quorum assembly should handle an incoming JOIN or LEAVE message from a process which is not in the write quorum.

22.5 For the BULLY and RING election algorithms:

How many messages are involved in electing a new co-ordinator?

What should happen when a failed co-ordinator restarts?

22.6 Give examples of applications which you would you expect to require strong consistency of data and of those which you would expect to prefer or tolerate weak consistency.

22.7 For the n-process mutual exclusion algorithms:

Is each algorithm correct and free from deadlock?

How many messages are required for the entry and exit protocols, with and without failure detection?

How many processes does each process need to communicate with?

How easy is it for the group of processes to be reconfigured

by JOIN and LEAVE?

PART IV CASE STUDIES

Chapter 23 UNIX

Exercises

23-1 What is meant by saying that UNIX is a procedural system? How might the hardware support or assume this style of system design (see Sections 8.3 and 4.11, for an example)?

We must have at least two privilege states of processor execution and trap instructions which cause a change to privilege state in order to enter the OS. There may also be support in the memory management hardware for protected OS regions.

How does procedural execution of the kernel affect the design of the mechanism for synchronisation between the hardware and processes?

If there are no dedicated device handler processes the interrupt service routine cannot just signal the one interested process. We need a more general event structure.

How might a procedural design make it difficult to guarantee a bound for the time between an event occurring and a process that was waiting for it running at user-level?

Everything is done in-process. Your thread of control (which was sleeping, waiting for an event) may have to undertake work for other processes between being awakened (scheduled as a kernel thread) and returning to user mode.

23-2 Assume that all the directories and files in the following pathname and all the inodes except that of the root are on disk rather than in main memory. Assume that each directory may be read by a single disk access.

How many disk reads would it take to resolve the pathname:

/usr/guru/examples/C-examples/C-prog

We need to access 4 additional inodes and read 5 directories (root, usr, guru, examples, C-examples). The number of disk reads depends on whether any of the inodes are in the same disk page and whether the relevant part of each directory can be read with a single disk read.

23-3 Suppose a filing system uses a block size of 4K bytes and a block pointer occupies 4 bytes. What size of file can be accessed from 12 direct pointers in an inode?

48Kbytes

What size of file can be accessed from 12 direct pointers and 2 indirect pointers (see Section 23.6)?

Assume that indirection is via a block which is also of size 4Kbytes, so each indirect block can contain up to 1K block pointers. The maximum file size is 48K+2*(1K * 4K) bytes, or (48K + 2**23) bytes or 12+2K blocks=2060 blocks.

What size of file could be accessed with an additional double and triple indirect pointer?

Add on 1K*1K*4K bytes = 2**32 bytes for a double indirect pointer.

Add on 1K*1K*1K*4Kbytes = 2**42 bytes for a triple indirect pointer.

Figure 23.11 shows the system open file table. Each entry contains a pointer into an open file for use in reading and writing. If the table is designed so that this pointer occupies 4 bytes, how large can a file be?

The maximum offset is 2**32 bytes.

23-4 Suppose that an instructor has written a skill-testing game program called /usr/boss/tester. Her pupils are to run the program which will record their scores in a file called /usr/boss/scores.

How should the instructor set up the access permissions of the two files so that the scores of the pupils can be recorded but only by running the program /usr/boss/tester.

Are the pupils’ own files accessible to the tester program while they are running it?

tester and scores should be set up with access rights (respectively):

owner=all, group=execute, rest=execute, set-uid=true owner =all, group=none, rest=none.

The pupils’ files are accessible if tester is allowed to set the real-user-id to be the same as the effective-user-id.

23-5 What are the advantages and disadvantages of using process-relative names, such as small integers, for open files, pipes, sockets etc?

Small numbers are more convenient to handle than long bit-strings.

They have no meaning outside the context of the owning process(es), so cannot be passed around generally to achieve co-operative sharing. UNIX assumes that this will be achieved by parent-child co-operation - the parent’s open file id’s are passed to the child on create.

23-6 Contrast the effect of the UNIX fork system call with that of the language-level fork described in Sections 4.16, 11.2.4 and 12.8.

UNIX fork creates a new address space with a separate copy of the data space, open file table and so on. A language level fork creates a new process (or thread) which runs in the same address space as its parent. It shares the same memory management tables, open file tables etc. A new stack is created for the new process. The OS may or may not be aware of the new process.

23-7 The UNIX command interpreter (shell) is run as a user-level process. Explain how the shell sets up a process to execute a command and how the command’s arguments might be passed to it. How would you run your own command interpreter instead of the standard shell?

We assume that a shell instantiation has received a command by means of standard input. The command program is to be executed by a new process and arguments must be made available to it. Recall that on fork, the parent’s data space is replicated in the child. This can be used to make the command arguments available to the child. After fork completes the child (process-id=0 returned by fork) execve’s the command program which can pick up its arguments from a place agreed by convention.

The login procedure looks in your password file entry to find out which command interpreter you wish to use.

23-8 UNIX keeps a pool of character buffers of a defined size (say, 32 characters) for terminal I/O. Work out in some detail how a character would be delivered by a device, accepted by an interrupt handler, checked to see whether any special action should be taken (for example on a break or flow control character) and put in a buffer.

A buffer pool of c-blocks is used. For a given terminal a c-list is set up, with c-blocks added as required, for input and for output. On input, each character must be examined; control characters are acted on, not buffered. A process may be blocked on output, before passing all the characters, if its output list is getting too long (until the device has done output to reduce the size of the list).

M J Bach (1986) and Leffler at al. (1989) give details.

23-9 UNIX keeps a cache of disk block buffers. Each buffer has a data area which is the size of a disk block and a header for control information. What information would you keep in the header to control the use of the buffers and to ensure mutually exclusive access to them?

An example is given after Chapter 10.

23-10 A process creates and opens a file and a pipe then creates two child processes. Which system calls can the children use to access both the file and the pipe? Which system calls can be used for the file but not for the pipe?

Read, write and close may be used for both the file and the pipe.

Seek may be used for the file but not for the pipe.

23-11 Extend Figure 23.11 to show two independent sharers of an open file and two sharers of a file which have inherited it from a common ancestor.

The figure should show the independent sharers with a separate entry in the open file table and therefore a pointer each. The children who have inherited the file share a single entry and therefore a single pointer.

23-12 Are UNIX signals a general interprocess synchronisation mechanism? Discuss.

No. In their use they are more akin to exceptions in programming languages. Signals tend to be associated with error conditions.

Re the signal mechanism: a process does not block, waiting for a signal. It may test to see whether a signal has been sent, if not it continues. This typically happens when a process is about to return from kernel to user mode and sometimes during the sleep/wakeup process.

23-13 Why is there no need for semaphores in the UNIX kernel?

Processes executing the kernel are scheduled non-pre-emptively. We always have control of which process is accessing which data structure. Interrupts are forbidden so that a process and interrupt service routine do not access the same data structure at the same time.

23-14 How might a crash make a UNIX filing system inconsistent? Consider a sequence of events that might lose a block to the filing system or might cause a block to be recorded as allocated both to a file and to the free list. How does the way UNIX records the blocks allocated to files help to make consistency checks relatively easy?

Assume main memory is lost on a crash. The inode for a file, the free list and file data are all changed first in main memory, then the changes are written to disk. Blocks may be held in the buffer cache instead of being written to disk synchronously with an operation.

How to lose a block: suppose a block is freed that was in use in some file. The main memory copy of the inode is changed, the block is added to the record of the free list in main memory, the inode is written to disk, there is a crash.

How to doubly allocate a block: a new block is needed, say to append to a file. The main memory inode and free list are changed. The inode is written to disk. There is a crash.

All the blocks allocated to files can be found by reading the inode table (and indirect blocks). There are no chains of blocks to follow.

23-15 Find a list of UNIX commands. Outline how a spelling checker program could be written by composing UNIX commands in a pipeline.

Along the lines of:

cat filename |

tr A-Z a-z | %remove upper case

tr (punct) (null) | % remove punctuation

tr (blanks) (newline) | % each word on a new line

sort | % sort into alphabetical order

uniq | % remove adjacent identicals

common dictionary | % find which words are in a dictionary

and so on: list words which are not in the dictionary.

Chapter 24 Microkernels: Mach and CHORUS

Exercises

24-1 How is a port transferred from one task to another in Mach and from one actor to another in CHORUS?

A Mach message contains typed data. Port is a type. A port name is a capability with port identifier and rights protected by encryption. A port is not permanently bound to one process. Ports, with associated rights, are transferred in messages.

A CHORUS port is attached to a single actor at any time but can be transferred from one actor to another by means of a system call. A message is untyped.

24-2 What is the difference between Mach and CHORUS port sets?

A Mach port set is a single logical unit with a single message queue. Any process with receive rights for the port set can take a message from the queue.

Each port in a CHORUS port group is bound to a single actor. The concept is used to implement multicast. A message sent to a port group is sent to each port in the group.

24-3 What functions have been implemented inside the microkernel in Mach and CHORUS. Which functions are implemented as servers above the kernel?

Mach; tasks, threads, IPC (messages, ports, port sets), hardware exception handling, minimal memory management with the possibility of additional pagers at user level (file I/O is done by mapping memory objects). Network communication is achieved by message passing via a user level server.

CHORUS: actors, threads, IPC (messages, ports, port groups), hardware exception handling, basic memory management with the possibility of user level servers called mappers. Some functions are provided as supervisor actors that have the kernel ‘s level of privilege but are dynamically linkable.

24-4 To what extent do Mach and CHORUS support real-time execution?

They are not specifically real-time kernels. They have more chance of meeting real time requirements than, say, UNIX because they support multiprocessor operation, thread priorities and pre-emptive thread scheduling.

24-5 What are the implementation languages of Mach and CHORUS? C, C++.

24-6 How are memory management techniques used to avoid physical copying of data in Mach and CHORUS?

Data is transferred by message passing. Both systems now use page mapping and copy on write to avoid data copying. In Mach, if an asynchronous send is performed the data is first mapped into the kernel address space, and into the destination address space on receive.

24-7 Which objects in Mach and CHORUS are given globally unique identifiers?

Mach: tasks, ports, port sets

Chorus: actors, ports, port groups.

How are the objects referenced by these identifiers located in each system?

A full answer is not available to this question.

Ports, port sets and port groups are the destination addresses for messages; that is, the IPC mechanism is responsible for locating these objects. For IPC within a single system, the local OS has full knowledge of the names it manages and the bindings between them, for example, of ports and port sets to tasks. For distributed IPC, the names used must be bound to locations by a technique such as those described in Section 14.9. Mach employs a dedicated network server process at each node to handle non-local IPC.

Chapter 25 Windows NT

Has no exercises

Chapter 26 Middleware: CORBA and Java

Has no exercises

Chapter 27 Transaction Processing Monitors and Systems

Exercises

27-1 What problems arise in a TP monitor from using one operating system process per user terminal?

This approach is common in general purpose OS. It may be suitable for small TP systems but does not scale well to systems where there is a large number of terminals. It might be required to extend to a process per terminal per component of a distributed system.

A large number of OS processes means too much space for per-process data structures and too much time in processing them. It is also difficult to give certain application functions low priority at times of heavy load. Priority tends to be associated with processes, not what they are doing.

27-2 How do the problems which arise when single threaded processes are used for building multi-threaded servers apply to TP system implementations? How have these problems been addressed in practice?

Suppose we have threads at language level only. Suppose each thread is associated with a terminal. If a thread makes a synchronous, blocking system call the whole process is blocked. Also, threads share an address space and there is no inter-thread protection. The problems have been addressed by implementing asynchronous I/O interfaces and by standard techniques for achieving protection by software, such as by using a strongly typed programming language.

27-3 Why is connection oriented communication between the components of a TP system less elegant than an RPC system? Why might it be more efficient?

The processes which comprise the distributed components of the TP system set up appropriate long term connections. These are used for the communications of many transactions. The programmer must manage this shared use of the communication channel.

The configuration of a TP system may vary. The programmer must be aware when to use same machine procedure call and cross machine message passing via a connection.

RPC supports a separate communication per transaction and might be implemented transparently to the programmer. RPC could be less efficient than a permanent connection but the RPC system could cache the destination addresses of frequently used remote procedure names.

27-4 Why is the use of a separate process for each component of a TP system more flexible than a shared address space solution?

So that a system may be configured and reconfigured without changing the component programs.

27-5 Indicate where buffered transaction processing is used when goods are paid for by cheque, credit card and debit card. How could online TP be brought into point of sale applications?

The approach is to batch transactions throughout. At present, debit cards function in the same way as cheques and credit cards. Batches are assembled at each vendor and at the vendors’ banks. The central clearing bank creates a batch of transactions for each member bank.

Online TP could be brought into point of sale in theory. The bank account of the purchaser could be interrogated on-line and the debit made. The system would have to be made failure tolerant.

27-6 What are the factors affecting the design of PIN-based authentication?

The PIN must never be transmitted or stored in clear. It must never be printed so that an employee can see it and associate it with an individual.

What would be the problems arising from a 12 digit PIN?

We would forget it or we would write it down.

Why shouldn’t I choose a PIN and tell my card issuer by writing it on a form?

This could be done if a special form with no personal information and with an encrypted account number was used. There could be a risk that those dispatching the form might be see the encrypted account number and the name and address to which it was sent and this process would have to be automated.

27-7 What would be involved in keeping the bank accounts in the bank account database computers and the ATM controllers completely up-to-date?

How to handle replicas is a general question encountered in large-scale distributed system design. A short discussion is given in Section 20.2. The choice is between consistency and availability. We can use a protocol such as 2PC to lock all replicas and propagate and update, or, less stringently, quorum assemble. Or we can live with a degree of inconsistency. In this application it might be that the model of a primary copy with replicas for efficient access is appropriate.

27-8 Why are TP benchmarks needed?

So that TP systems can be compared by purchasers and so that conflicts between purchaser and provider can be resolved. That is, the provider of a system may define its performance in terms of accepted benchmarks. A purchaser in dispute with the provider may be able to demonstrate that the performance is inadequate with respect to the benchmark.

Part IV

Exam Questions

The Cambridge exam questions from 1993 are available at: http://www.cl.cam.ac.uk/tripos/papers.html Relevant courses are:

Concurrent Systems
Distributed Systems
Operating System Foundations
Operating System Functions
Operating Systems
Unix Case Study

From Edition 1: Questions up to 1992

The following are extracted from Computer Science papers at Cambridge. In each case the year the question was set is indicated. I have included a few questions from related areas such as computer architecture and communications and a few from follow-on courses such as Security, Distributed Systems and Computer Systems Modelling to show the context for Concurrent Systems.

PART I

92 Consider the design of operating system facilities to support the input and output of data by user processes. Suppose that there is a dedicated device handler process for each device and that, while the handler transfers a fixed amount of data between memory and the device interface, a user process may request the transfer of an arbitrary amount of data by means of a system call.

Describe how data transfer and synchronisation may be arranged

between the handler and

(a) a system process carrying out a user’s request;

(b) the device.

In each case, outline the data objects, associated operations and synchronising primitives that might be used.

92 Describe the operating system data structures required to support a virtual memory system with both segmentation and paging, and how memory sharing between different processes is implemented.

Describe what operations must be performed by the operating system on an address translation failure and the relative merits of implementing some of these functions in hardware rather than software.

What would be the benefit of implementing both swapping and paging in such a system?

88 Does the ever-increasing availability of real memory do away with the need for virtual memory systems? Distinguish in your answer between the requirements of large timeshared computers and those of single user workstations.

What is the point of using hardware-supported segmentation as well as paging in a virtual memory? Discuss the choices about the size of the segment number field in a virtual address.

88 Describe a hardware method of storage allocation and protection which allows separate concurrent user programs to share library routines in the main store. Estimate the hardware needed and the likely performance reduction. What are the compensating advantages?

89 The following reasons can be given for the use of memory management (relocation and protection) hardware in a computer system:

(a) to provide a larger virtual address space than real memory;

(b) to support the efficient management of real memory by the operating system;

(d) to provide protected sharing of system utilities such as compilers, editors and libraries;

(e) to provide protected sharing of parts of the operating system;

(f) to provide protected sharing of applications subsystems.

For each of the above:

(i) indicate memory management hardware that would meet this requirement;

(ii) discuss the kinds of system for which this requirement would and would not be important.

92 Among the functions of a filing system are path-name resolution, access control, concurrency control and existence control.

A network-based file storage service supports the file abstraction, where a file is an uninterpreted sequence of bytes with a unique identifier.

For each of the four functions listed above, discuss whether it might be located in the service or in the client systems.

92 A file system is required to support random access to files of several tens of megabytes; describe two file structures which would support this efficiently.

For your mechanisms, describe and provide estimates of the costs incurred in accessing random blocks.

How would your two mechanisms cope with the insertion of a new block into the body of the file?

91 Discuss the relative merits of three different methods which can be used to represent a file as a sequence of disk blocks. In each case, what would be a suitable representation for the list of free disk blocks?

How can the algorithm for the allocation of free blocks be tuned for different uses?

91 In a general-purpose filing system the "file" is usually the unit simultaneously of naming, of access control, and of interlock. Discuss the circumstances in which it is suitable to have the same unit for all three, and give examples of each being an unfortunate choice. Under what circumstances might it be desirable to use advisory locks rather than enforced ones?

88 Compare and contrast Access Control Lists and Capabilities as mechanisms for the protection of data.

A user U normally has authority to access files F1 and F2. The user U executes a program P which requires to access file F3. Describe how protection techniques based on (i) Access Control Lists and (ii)

Capabilities may be employed:

(a) to prevent user U from accessing file F3 other than by executing program P;

(b) to prevent program P accessing file F1 when U executes P, whilst allowing P to access F2 when U so require

89 Multi-access operating systems need to provide both protection and sharing. How would the system software, interacting with the hardware, meet the following requirements?

(a) It must be impossible for users to program the line printer directly resulting in interleaved lines of output;

(b) once a user process is running on a processor it must be possible to stop it and let other processes have their turn;

(d) it must be possible for users to execute a shared copy of system utilities such as editors or compilers but impossible for them to corrupt these utilities, the operating system or other users’ code and data;

(e) it must be impossible for one user to login to the system with the identity of another user.

92 Under what circumstances might an operating system’s process scheduler be able to avoid a general process schedule?

How might the priority of processes be made dynamic without a large computational overhead?

A programming language supports lightweight processes which share memory. The operating system on which the language is to run allocates a separate address space to each of its processes. What problems do you foresee with this arrangement?

How could a different operating system design avoid these problems?

91 A function of an operating system is to create the process abstraction. Outline how this is done. How might synchronisation between a process and the hardware be supported? [For example, consider a device handler and the associated device.] How might synchronisation between processes be supported?

91 Outline the design of a scheduler which is required to implement static priorities and time-slicing. Discuss your design choices.

What modifications would you make to support:

(a) a large number of priorities;

(b) multiple threads per virtual address space;

88 The process management subsystem of a multi-access operating system contains a process table with space for a maximum of 100 processes. Each entry in this table is a process descriptor (process base).

Discuss in detail the design options for the process descriptors showing how, for example, process priorities, process scheduling and synchronisation with the hardware might be supported.

89 A distinguished scientist once criticised an operating system named OS/360, alleging that it was wasteful of memory space and that its design was unnecessarily embellished:

For example, OS/360 devotes 26 bytes of the permanently resident date-turnover routine to the proper handling of December 31 on leap years (when it is Day 366). That [function] might have been left to the operator. Source: The Mythical Man-Month (Brooks, 1975)

Contrast and compare the two alternatives for handling extraordinary date calculations (such as leap years): when the operating system handles these, and when the computer operator must handle them. Does your analysis bear out the opinion expressed in the above quotation?

PART II

88 The kernel of a multi-access operating system for a uniprocessor machine is mapped into the address space of every process. Each process occupies a separate address space. The kernel runs almost entirely "in-process" apart from a very small number of dedicated system processes. Processes are scheduled pre-emptively.

(a) Outline the type of facility that might be provided in such a system for communication between processes executing in user mode; i.e. not executing the kernel.

(b) Give a detailed solution for processes accessing a kernel data structure which may be required either for shared reading or for exclusive writing.

89 A language runtime system contains a manager for items of type SEMAPHORE.

(a) If the language offered semaphore operations to the programmer, how would semaphores be declared, initialised and used to achieve:

(i) exclusive access to a shared data structure;

(ii) synchronisation between pairs of processes;

(iii) allocation of multiple instances of a resource?

(b) If the language offered a critical region construct such as:

region v do begin . . . end;

where variable v has been declared as a shared instantiation of some data type, how could the compiler use the semaphore manager to implement this construct? Discuss how synchronisation over the state of the resource protected by the region might be offered in the language and implemented.

91 The requirements for implementing a concurrent system may be stated as follows:

(a) support for separate activities;

(b) support for related activities to work together;

Discuss how these requirements may be met through use of a suitable language system and operating system. Explain why some combinations of language system and operating system will be unable to meet these requirements.

a) Describe how the components of a Remote Procedure Call system relate to the OSI reference model.

b) How does the design of the underlying operating system affect the implementation of an efficient RPC system?

89 The diagram shows the major components of a language level remote procedure call (RPC) system.

For the case where the RPC system uses a request, reply, acknowledge protocol and offers a choice of at most once or exactly once semantics:

(a) explain what happens at points A, B, D and E when the network and both calling and called systems are performing well;

(b) explain how the protocol deals with transient network congestion;

(c) explain how the protocol deals with a crash in the calling system which occurs after a call is made and before the result is received;

(d) explain how the protocol and/or higher levels in the system might handle prolonged network failure or a crash in the called system followed by a restart.

PART III

89 You are designing an airline booking system.

(a) Discuss the facilities you would look for in a language and/or operating system to ensure that an operation such as "book two seats on flight n" could complete correctly.

(b) Explain the additional facilities that would be needed if your system was to automate the booking of seats on a number of connecting flights where all or none of the flights should be booked.

92 Outline the following three approaches to concurrency control in transaction processing systems:

(a) two-phase locking (2PL);

(b) timestamp ordering (TSO);

Say briefly how each of the methods achieves serialisability.

Why are TSO and OCC ostensibly more suitable for implementing a distributed transaction processing system than 2PL?

89 Explain why a transaction of an application program with a database may be regarded both as a unit for recovery and as a unit for concurrency control. Describe briefly two basic approaches by which the scheduler of a Data Base Management System can ensure that the interleaving of operations in concurrently executing transactions cannot lead to inconsistencies in the database. What potential problem(s) of basic Two-Phase Locking does Strict Two-Phase Locking avoid?

91 What are the advantages and disadvantages of a persistent programming language when compared with a relational DBMS for use in database applications? Describe extensions to the compiler and run-time system of a programming language that are required to maintain persistent data values and to support the database programmer.

PART IV CASE STUDIES

92 For the UNIX process state graph given below, describe each transition briefly. Indicate which of the transitions show that UNIX is executed procedurally.

88 UNIX is an old operating system, and since its inception hardware and software techniques have progressed significantly.

Describe which design features are no longer appropriate, still relevant, and now relevant for different reasons.

88 Describe all aspects of how compatible file, device and inter-process input and output are achieved in the UNIX operating system.

What are the good and bad consequences of this generality?

What has been done about the bad consequences in recent versions of UNIX?

89 Explain how UNIX system calls may be used:

(a) to create two processes which read records from a common data file, perform some processing on the records and write results to a common output file;

(b) to create two processes in a pipeline such that one reads units of data from a file, performs some processing on each unit and produces some intermediate output and the other takes that intermediate output, performs further processing and writes a final results file.

For each of (a) and (b) above explain how the system calls are implemented.

91 With reference to the diagram below which shows the basic modular structure of the UNIX kernel, explain briefly:

a) how the buffer cache is used;

b) the UNIX model for achieving dynamic execution of the kernel by processes.

88 Either:

(a) The TRIPOS Operating System disables interrupts whenever critical regions of code are executed. Discuss, with particular reference to TRIPOS message passing, scheduling and storage management, whether this design decision was sensible.

Or:

(b) (i) Describe the relationships between tasks and virtual address spaces in IBM’s MVS operating system and between processes and virtual address spaces in UNIX. What are the advantages and disadvantages of the two approaches?

(ii) A file in the MVS operating system need not occupy a contiguous area of disk. Explain, in general terms only, how the system records where a file’s contents reside on a disk. What are the limitations imposed by this scheme? Are there any advantages?

ARCHITECTURE

91 Use the design of the MIPS R2000 processor to illustrate how instruction execution time can be made to approach one clock cycle.

91 Discuss the approach to provision of support for:

(a) subroutine call and return;

(b) interrupt handling

in the MIPS R2000 processor.

92 The five phases of instruction execution in the MIPS R2000 pipeline have mnemonics: IF RD ALU MEM WB.

(a) Explain what is done in each phase by using the following program fragment as an example. Assume that register 2, $2, contains the address of the start of a data area.

LW $3, 0($2) % load word into $3

LW $4, 4($2)

ADD $5, $4, $3

SW $5, 8($2) % store word from $5

(b) Using the above program, explain why a load delay slot is necessary.

92 A MIPS R2000 virtual address comprises a 20-bit virtual page number and a 12-bit within-page byte offset.

Outline how address translation is performed by the on-chip memory management unit. Explain how a translation lookaside buffer (TLB) entry is set up by an operating system.

Discuss the factors influencing the size of the TLB. Would you expect 64 entries to be sufficient?

89 Many central processing units have features or special instructions which were included because they can be exploited to good effect in or via an operating system. Describe four such features, explaining the consequences for an operating system and for the user of their presence or absence.

SECURITY and PROTECTION

88 Public Key Encryption has been suggested to be a secure means of establishing mutual authentication of two interacting entities on a network.

Starting from the position where neither yet knows the other’s public key, describe carefully how a two-way communication link is set up, identifying at each stage what authenticated information each has gained about the other.

To what extent is this procedure reliant upon the correctness or trustworthiness of an Authentication Server on the network? What advantages does Public Key Encryption have over normal encryption methods?

89 Suppose you are designing the software for a network of banking computers capable of controlling automated cash dispensers and accounts enquiries. Explain how the system and its component parts should be tested.

89 Are there facts about individuals that it is important that government agencies should not know? How difficult is it to stop them from knowing such facts?

COMMUNICATIONS

89 A digital local area network (e.g. Cambridge Fast Ring) is to be used to transmit TV quality video streams in real time. Make notes on the following:

(a) bandwidth requirements per stream;

(b) encoding of the voice and video samples within packets;

(d) limitations of a large system with many users.

88 Describe the operation of a CSMA/CD network.

What are the performance characteristics and how do they change as the speed of transmission is increased or decreased?

89 Describe the basic operation of a CSMA/CD (such as Ethernet) local area network. What is the vulnerable period? How does the effect of the vulnerable period change as the baseband transmission speed is increased or decreased? Comment on the suitability of CSMA/CD for transmission of real-time streams such as telephone traffic.

(a) The quality of service offered by a channel is influenced by the multiplexing strategy used at lower layers. Discuss.

(b) Consider a network composed of hosts and switches interconnected by point-to-point links. Describe the relationship between the multiplexing mechanism used on the links and the technique used for switching. How is this related to processor scheduling?

89 Define the terms coding and multiplexing as applied to communication systems. Give three detailed examples of each, choosing them from different layers of the communications hierarchy.

DISTRIBUTED SYSTEMS

88 You are to advise an international company on the design of a distributed system which will provide both an electronic mail delivery service and a general registration service. A liberal number of dedicated machines are proposed to implement the services, with local disks sufficient to hold both users’ mailboxes and the database to support the services. Principals (e.g. human users, machines, services) are to be identified by a two-dimensional name such as guru@cam. Such names are also to be used for naming groups - lists of principals or other groups.

(a) Define an interface for the mail service.

(b) Define an interface for a registration service which is sufficiently flexible to be used for authorisation, authentication and location.

(d) Describe how you would achieve service availability in the presence of service machine failures.

(e) Outline the approach you would adopt to the consistency of data across the service machines.

89 The heads of all the departments in a certain university have to make a decision. The majority of the heads are hardworking, consistent and loyal, and will make a serious attempt to determine the consensus opinion of their colleagues: once the decision is reached they will carry out the necessary action. It is known, however, that not all of the heads can be trusted. Some will behave in sporadically irrational ways, having cracked under the strain, but others harbour grudges against the university at large and may act with great finesse in ways calculated to disrupt it. Nobody knows which department heads are sound or where suspicion is justified. In this atmosphere of paranoia no central meeting can be arranged. Instead the department heads send individual messages to one another across the newly commissioned university-wide computer network. The computing service is above suspicion, and guarantees that messages sent will be delivered intact, and that the recipient will be able to tell who sent the message. Unfortunately a message can only be sent to one person at once, and can only be forwarded by editing its text into the body of a new message. The computing service charges a flat rate for transmission regardless of message length: university economy guidelines require that any use of the network should be justified.

Design protocols intended to allow the reliable department heads to come to a decision despite the worst endeavours of their incompetent or malicious colleagues. Defend your design by identifying strategies that might be adopted by the unscrupulous to prevent the reliable department heads from reaching the correct consensus, and discuss what happens as the proportion of mavericks in the university grows.

It may be useful to start by considering four departments with at most one traitor.

[reference: Lamport et al., 1982]

91 A client-server style of interaction is to be supported in a distributed system. Discuss the design options for arranging for the operations provided by a server to be invoked by clients. [Hint: you should consider the extent to which distribution transparency can be achieved.]

92 A naming service is to be designed for a large-scale distributed system. Principals such as human users, services and server machines are to be named. The name space should be extensible and amenable to restructuring.

(a) What are the advantages and disadvantages of using pure and impure names?

(b) State and discuss the assumptions you would make about the nature of naming data in such a system.

COMPUTER SYSTEMS MODELLING

91 (a) It is often said that response times are harder to predict than utilisation. Why is this so? Is it important?

(b) Describe and contrast bottleneck analysis and balanced system bounds. When are balanced system bounds more useful?

92 You have 1 Gbyte of keyed records to sort into order using a computer which has 2 Gbyte of available disk and 50 Mbyte of available internal memory. How would you do it and how would the cost depend on the amount of data?

What difference would it make if there were only 5 Mbyte of internal memory?

What difference would it make if the operating system let you have only five files open at a time?

91 (a) What actions take place at a disk when a block is read or written?

What parameters are needed to specify the performance of a disk?

(b) Consider a Markov chain model for a disk which has a multiple stage server, with one exponential stage for each component of disk access. Assuming a fixed arrival rate with exponentially distributed inter-arrival times, sketch the state diagram of the system.

Part V

TRANSPARENCIES in addition to figures

foil on Section 1.5

Requirements for implementing concurrent systems

1. Support for separate activities

2. Support for the management of separate activities

create, run, stop, kill

indicate their relative priorities

3. Support for related activities to work together

both cooperatively and competitively

4. The ability to meet timing requirements

5. Support for composite tasks

A single task may have several components which are executed concurrently

with other tasks. The system may fail after the completion of some of the

subtasks but before the whole task is complete.

foil on Section 2.3 Operating System Functions

• manage resources

allocates processors, memory, disk storage

responds to events associated with resources

• service to clients

creates a high level interface - virtual resources

(virtual processors, virtual memory,

virtual devices, virtual storage (files) )

TRADEOFF:

resource utilisation (throughput)

fairness to all users

Operating system interfaces

• set of commands

to create and compose processes

• set of system calls

processes invoke service

foil on Section 2.5 (and ref Chapter 24)

Microkernel structured operating system

+ tailored configuration

+ small - easier to engineer, debug and maintain

+ services run above the kernel

easier to engineer, tune, update, maintain

+ bounded kernel-execution time

+ policy - mechanism separation

kernel provides minimal mechanism

flexible expression of policy at user level

+ can emulate existing OS while providing new facilities

e.g. UNIX emulation + multiprocessor support

- a service runs more slowly at user level than in the kernel

e.g. standard file service and communications

Chapter 3 The Hardware Interface, I/O and communications

foil on Chapter 3 I/O

Device Management

• review how devices are controlled by program

• this is a function of the OS (WHY?)

too complex for user programs -> libraries?

can create device-independent I/O

must prevent interference between clients

• if clients are prevented from doing I/O (HOW?)

• they need a mechanism to request it (WHAT IS IT?)

• can create high-level virtual devices

e.g. stream I/O

Chapter 4 Support for Processes

foil on scheduling

When is process scheduling needed?

• UNARY

the system is idle and an interrupt arrives which frees a process

• BINARY

a process is running and an interrupt arrives

which frees another process

a process is running and wakes up another process

• GENERAL

the current process executes WAIT and is blocked

foil on Section 5.5

Characteristics of distributed systems

• Independent failure modes

of the components of a distributed program

of the connections between them

• Concept of global time is meaningless

clocks are not identical

communication takes time

a partial ordering of events is possible

• Inconsistent state

propagation of knowledge of events

consistency of (any) object replicas

related changes to distributed objects

no possibility of quiescent state

foil on Section 8.5

Requirements for process interactions

• synchronisation

co-operation through WAIT and SIGNAL

• mutual exclusion

competition for resources

WAIT for a resource

SIGNAL that you have finished with it

foil on Section 8.6

Types of process interaction

• one to one

e.g. pipeline

• any to one

e.g. client-server

• one to many

broadcast (to all) or multicast (to a specified set)

• any to one of many

e.g. client-server(s)

• many to many via shared data

all may read and write the shared data

foil for Exercise 10-6 semaphore solution to sleeping barber problem

(see also Chapter 17 and solutions to exercises for Chapter 17)

waiting : integer :=0; %customers waiting to be cut

guard : semaphore :=1; % to delimit a critical region to protect waiting

customers : semaphore:= 0; %counting semaphore of customers

barber : semaphore :=0; %is barber waiting for a customer (1) or not (0)

the barber executes the following program:

WAIT(customers); %sleeps if there are none

WAIT (guard);

waiting := waiting - 1; %otherwise, changes waiting under exclusion

SIGNAL(barber); % and indicates his readiness to cut hair

SIGNAL (guard);

cut hair

a customer executes:

WAIT (guard); %test and set waiting under exclusion

if waiting < chairs % if there is a free chair to sit and wait

then { waiting := waiting+1;

SIGNAL(customers) % indicate one more customer

SIGNAL(guard) % release the exclusion on waiting

WAIT(barber); % use the barber resource

have haircut; }

else SIGNAL(guard); % if there are no free chairs just release

% the exclusion on waiting and go away.

foil with code for Figure 11.9 monitor for readers and writers

read-write: monitor

entry-procedures startread, endread, startwrite, endwrite

var ar : integer % we now need to know if there are active readers

busy-writing : boolean;

free-to-read, free-to-write : condition;

% in startread a reader now waits for a writing writer tofinish

procedure startread ()

begin ar := ar +1;

if busy-writing then WAIT (free-to-read);

SIGNAL (free-to-read) % If one reader can read, all can read. Each

end startread; % reader wakes up another until none is left.

% in endread the last reader signals a writer

procedure endread ()

begin ar := ar-1;

if ar = 0 then SIGNAL(free-to-write)

end endread;

% in startwrite a writer now waits for a writing writer to finish or for no

% waiting readers

procedure startwrite ()

begin if busy-writing or ar > 0 then WAIT (free-to-write);

busy-writing := true

end startwrite;

% in endwrite any waiting reader is now given priority over any waiting writer

procedure endwrite ()

begin busy-writing := false;

if ar>0 then SIGNAL(free-to-read)

else SIGNAL (free-to-write)

end endwrite;

end read-write;

foil on Section 11.4.2 Active objects in Ada

task-body buffer-manager is

begin

loop

SELECT

when count < buffer-size

ACCEPT insert (parameter) do

[code to put item into buffer]

end

increment count;

[manage pointer to next slot for insertion]

when count>0

ACCEPT remove (parameter) do

[code to take item out of buffer]

end

decrement count;

[manage pointer to next slot for removal]

end SELECT

end loop

end buffer-manager;

foil on monitor solution to Exercise 11-9 sleeping barber problem

barbershop: monitor

waiting : integer := 0; % customers waiting to be cut

customers : condition :=0; % for the barber to wait for a customer

barber : condition := 0; % for a customer to wait for the barber

procedure seek-customer( ) % called by the barber

begin if waiting=0 then WAIT (customers); % sleeps if there are none

waiting := waiting-1; % one less customer waiting

SIGNAL (barber); % frees a waiting customer

end seek-customer ;

procedure get-haircut( ) % called by a customer

begin if waiting < chairs % is there a free chair to sit and wait?

% if there are no free chairs just go away

then { waiting := waiting+1; % one more customer waiting

SIGNAL (customers) % in case the barber is asleep

WAIT (barber); % wait for your turn with the barber

}

end get-haircut;

end barbershop;

foil for Section 15.6.1, see Figure 15.9 RPC

At point A:

• the arguments are packed into a data structure suitable for transfer

across the network,

• an RPC identifier is generated for this call,

• a timer is set.

At point B:

• the arguments are unpacked from the network buffer data structure in a

form suitable for making a local procedure call,

• the RPC identifier is noted.

At point D:

• the return arguments are packed into a network buffer,

• another timer is set.

At point E:

• the return arguments are unpacked,

• the timer set at point A is disabled,

• an acknowledgement is sent for this RPC id

(the timer at D can then be deleted).

Section 15.9.3 ANSA IDL and DPL example

Specify a service interface in IDL:

green : INTERFACE =

begin

lime : OPERATION ( arguments) RETURNS (arguments);

jade : OPERATION ( arguments) RETURNS (arguments);

end

A server may export it to the interface trader:

! USE green

! DECLARE {green_exportRef}: green SERVER

ansa_interfaceRef green_exportRef

! {green_exportRef} := traderRef$EXPORT (green,

" /ANSA/services", \"NAME `green' ", NTHREADS );

A client may import it from the interface trader:

! USE green

! DECLARE {green_importRef}: green CLIENT

ansa_interfaceRef green_importRef;

! {green_importRef} := traderRef$IMPORT (green",

" /ANSA/services", \"NAME `green' " );

and invoke an operation with immediate result:

! {result} := green_importRef $ lime (arguments)

or invoke an operation and collect the result later:

VOUCHER v;

! {v} := green_importRef $ lime (arguments);

! {result} := green_importRef $REDEEM (v);

Section 17.4

Conditions for deadlock to exist

SYSTEM POLICIES:

1. Exclusive access

a request can be refused

2. Hold while waiting

objects are held while more are requested

3. No preemption

a process requests, uses and releases an object

DYNAMIC BEHAVIOUR:

4. Circular wait

a cycle of processes exists such that each holds an object that is

requested by the next process in the cycle and that request cannot be

satisfied

Section 17.5

three foils on solutions to dining philosophers problem

(see solutions to exercises)

A first attempt (with deadlock):

repeat

think

WAIT (fork(i));

WAIT (fork(i@+1));

eat

SIGNAL (fork(i));

SIGNAL (fork(i@+1))

until false;

A solution:

guard: semaphore := 1

repeat

think

WAIT (guard); % put a critical region round

WAIT (fork(i)); % the two WAIT (fork) operations

WAIT (fork(i@+1));

SIGNAL (guard);

eat

SIGNAL (fork(i));

SIGNAL (fork(i @+ 1))

until false;

count: semaphore := 4

repeat

think

WAIT (count); % only 4 at once can pass this point

WAIT (fork(i));

WAIT (fork(i@+1));

eat

SIGNAL (fork(i));

SIGNAL (fork(i@+1));

SIGNAL (count);

until false;

Section 17.7, see Figure 17.7 for notation

Deadlock detection algorithm

The algorithm below marks the rows of the allocation matrix A corresponding

to processes which are not part of a deadlocked set.

1. Mark all null rows of A.

(A process holding no objects cannot be part of

a deadlocked cycle of processes.)

2. Initialise a working vector W = V, the available objects.

3. Search for an unmarked row, say row i,

such that Bi <= W

(the objects that process i is requesting are "available" in W)

If none is found terminate the algorithm.

4. Set W = W + Ai and "mark" row i.

Return to step 3.

When the algorithm terminates,

unmarked rows correspond to deadlocked processes.

Sections 18.4 and 20.3 The ACID properties of transactions

-must be guaranteed by a TPS in the presence of concurrency and

unpredictable failures.

Atomicity

Either all or none of the transaction's operations are performed.

This must be ensured by the recovery manager. A crash may occur part way

through a transaction and the invocations of any incomplete transactions

must be undone.

Consistency

A transaction transforms the system from one consistent state to another.

This is achieved through concurrency control, provided that atomicity is

guaranteed by the recovery manager in the presence of crashes.

Isolation

An incomplete transaction cannot reveal its result to other transactions

before it is committed.

This is achieved through concurrency control.

Durability

Once a transaction is committed the system must guarantee that the results

of its operations will persist, even if there are subsequent system

failures.

This is the responsibility of the recovery manager, based on general data

management policies.

foil on Section 18.6.2 examples of serialisability

£ is UK pound sign, dollars $ will do

serial schedules:

T before S or S before T

T: debit (account-A, $1000); S: read-balance (account-A)

T: credit (account-B, $1000); S: read-balance (account-B);

S: read-balance (account-A); S: print (account-A+account-B);

S: read-balance (account-B); T: debit (account-A, $1000);

S: print (account-A+account-B); T: credit (account-B, $1000);

T: debit (account-A, $1000) and S: read-balance (account-A),

(account-A: T before S)

T: credit (account-B, $1000) and S: read-balance (account-B),

(account-B: T before S)

a non-serialisable schedule:

T: debit (account-A, $1000);

S: read-balance (account-A);

S: read-balance (account-B);

S: print (account-A+account-B);

T: credit (account-B, $1000);

T: debit (account-A, $1000) and S: read-balance (account-A),

(account-A: T before S)

S: read-balance (account-B) and T: credit (account-B, $1000),

(account-B: S before T).

foil on Section 18.8 histories and serialisation graphs

A history is a data structure which represents a concurrent execution of a

set of transactions.

A serialisable history represents a serialisable execution of the

transactions.

That is, there is a serial ordering of the transactions in which all

conflicting pairs of operations at each object are invoked in the same

order as in the given history.

A serialisation graph is a directed graph that shows only transaction

identifiers and dependencies between transactions;

the vertices of the graph are the transactions Ti, and there is an edge

Ti -> Tj if and only if some object is a witness to that order dependency.

A transaction history is serialisable if and only if its serialisation

graph is acyclic.