Optimistic Concurrency Control

Optimistic concurrency control differs from the use of locks fundamentally, in that it is based on the assumption (observation) that in many systems, the vast majority of operations are independent. To this end, it avoids the expensive synchronisation and checkpointing necessary for transaction systems. [Ref: Strom/Yemini]. The distributed system is divided into a number of logical recovery units. Each of these periodically checkpoints its state, and continuously logs the stream of messages arriving at it, in such a way as to be able to roll back its state to the previous checkpoint. The state intervals of the system are partially ordered by a ``causal relation''. This is so that when there is a fault that leads to some partial failure in the system, the task of rolling back the whole system to a know state is a tractable one, since the system is modularised. The idea of optimistic concurrency control is that by analysing the dependencies, one can generate ``commit guards''. Committ guards are effectively the list of other recovery unit states that this execution is dependent on. Roll back based on logged state is only necessary if at some later stage it turns out one of the recovery units failed, perhaps with a lost message. Usually, systems built around this approach also engineer their networks to make message loss a rare event. Thus this is really an approach for partitioning up the recovery problem into smaller pieces so that the cost of keeping enough information to carry out roll-back is not too high in any one part..