Chapter 15 Crash resilience and persistent data
Objectives
To extend the definition of "atomic operation" to include an operation on data held in persistent store. To consider the possibility that a crash might occur at any time and to consider how this should be taken into account in system design. To outline approaches to implementing atomic operations on persistent data in the presence of concurrency and crashes.
Points to emphasise
- At this stage we are focussing on a single computer (not a distributed system) and introduce operations on data in persistent store.
- The contents of main memory may be lost on a crash at any time.
- A persistent programming language (PPL) might be available to manage the persistent data. More likely, a DBMS will manage fine-grain persistent data by means of files which it uses via an OS. We assume a DBMS with concurrent users.
- An end user may receive a result from the system (such as a DBMS) and go away. That result must have been stored persistently.
- We are concerned with achieving atomicity in the presence of crashes.
The techniques already covered in Part 2 have solved the problem of atomicity (of a single operation invocation on data in main memory) in the presence of concurrency. We assume the DBMS or PPL will control concurrent access to shared persistent data.
- An idempotent operation is repeatable. It is not possible to make all operations idempotent.
- An atomic operation in the presence of crashes is implemented by means of a great deal of mechanism such as storing before and after object state in a log or maintaining an old version of an object intact while the new version is built up.
- The concept of atomic operation is widely applicable in systems and applications, in particular, in distributed systems as we shall see later.
- An abstract operation should be an atomic operation, see Gleeson (1989).
Possible difficulties
To restrict the discussion to one operation invocation on one object to introduce the concepts as simply as possible.
Teaching hints
- Start from an operation on an object in the main memory of a single-computer system. The result of the operation is lost on a crash. You must store the result in persistent memory before you tell it to the user. Consider application areas where this is important.
- The logging and shadowing approaches to achieving atomicity are introduced here. It is a good idea to set exercises at this stage, in this simple context, to have the concepts in place before the complexity of composite operations is introduced in Part 3.
- Consider a workstation of the future with a large main memory, say 400Mbytes. How might this memory be used? Should there be a substantial amount of non-volatile main memory as well or should we rely on remote file servers, accessed across fast networks, for persistent storage?