Chapter 21 Distributed Transactions
Exercises
21-1 In what ways do distributed systems differ from centralised ones?
Revision of Section 5.5: no global time; independent failure modes; no quiescent, therefore consistent, system state.
21-2 How can the components of a distributed system agree on a basis for establishing system time? Under what circumstances might system time be needed?
Revision again, see Chapter 5.
21-3 Relate the object model in Section 21.3 and the distributed TPS of Section 21.5 with the discussion on the implementation of naming, binding, locating and invoking objects of Section 15.9.
First, consider the component TPSs as a distributed application that a client may wish to invoke. The components may be named as a set of application servers. The objects they manage are named internally by the TPS. This is the client-server model.
A more general object model might support global naming and invocation of the data objects. How might this be managed in a system?
We have persistent objects managed by an application. We assume a naming scheme for the objects within the TP system. The TP system might use a private naming scheme and a conventional file service which happens to be distributed.
If the underlying system is based on a client-server model it will provide facilities for naming and locating services. The object names are likely to have a component which identifies the server which created them. An invocation can then be directed to that server.
If the TP system is built in an object oriented world the (persistent) objects it manages might be made known to the underlying system and named and located through system policies and mechanisms.
In general, we assume that an object may be invoked from a TP component at any node of a distributed system. There must be underlying mechanisms to locate the object, given its name. The issue of naming and location is the concern of distributed systems designers.
This question raises issues that lead on to further study in distributed systems.
21-4 Describe the operation of a distributed TPS from the point at which a client submits a transaction to a single component of the TPS.
The transaction is managed from the node at which it is submitted, the "home node". Object invocations are initiated from the home node, for example by means of an RPC for each invocation. The details depend on the method that is used for concurrency control. Invocations may be made at each object, independent of other objects at this stage. Later, commitment will require co-ordination among the objects involved in the transaction.
When the transaction is ready to commit, the home node is responsible for ensuring that commitment is atomic over all the objects involved. If a pessimistic method for concurrency control is used this involves acting as the commitment manager for an atomic commitment protocol, such as two-phase or three-phase commit. The objects are unavailable to other transactions while this is in progress. Note that, in a distributed system, allowance must be made for failures of system components during commitment.
If OCC is used the home node is responsible for managing the validation and commit phase of the transaction. Again, this requires co-ordination among the distributed objects involved in the transaction. Although an object participates in only one validation protocol at a time, it is available to other transactions for other purposes.
21-5 Why are the timestamp ordering (TSO) and optimistic concurrency control (OCC) approaches to concurrency control potentially more suitable for distributed implementation than two-phase locking? How can 2PL be simplified?
Each object decides independently whether an invocation can be carried out (TSO) or committed at that object (OCC). 2PL requires distributed deadlock detection. A simpler approach is that a transaction waits for some specified time to acquire a lock on an object. After this it aborts and frees any other objects it has locked.
21-6 What is involved in the validation phase of OCC in a distributed system?
First, we have to establish that a consistent set of shadows were used by the transaction. This is done on the basis of the timestamps of the shadow objects. See the solutions to the additional exercises of Chapter 21 for an example.
If all is well, we need to check with each object whether any other transaction has committed a conflicting update to it since the shadow was taken by this transaction. If this is the case for any object involved then the transaction is aborted, otherwise commitment is guaranteed. This involves associating a timestamp with the transaction’s updates and sending them to each object involved. Objects must make updates in a globally consistent order.
21-7 Why is a complex protocol required to commit a transaction atomically in a distributed TPS?
An agreement must be reached among distributed objects. We must ensure that all commit or none do. The complexity arises because there are communication delays in a distributed system and any component may fail at any time. Also, an object may be involved in more than one transaction. We may fail to acquire the co-operation of an object in an atomic commitment protocol because it is participating in that for some other transaction. (We may have prevented this possibility by using strict 2PL or strict TSO for concurrency control).
What happens in the two-phase commit protocol if the transaction manager fails - discuss its failure at all relevant stages in the protocol.
The main point is that there is a single point of decision - when the decision to commit or abort is recorded persistently. On restart, the recovery procedures must ascertain the transactions for which commitment was in progress when the crash occurred and the decision reached, if any.
Suppose a participant fails after voting for commit of a transaction. What should it do on restart?
Find out the decision from the commitment manager and act accordingly.
What are the advantages and disadvantages of letting the nodes participating in a two-phase commit know about each other?
Suppose a node crashes and restarts. It needs to find out the decision on commit or abort. It may not be able to contact the commit manager. It may instead contact any other participant.
The disadvantage is extra requests that a given participating node has to handle. The list of participants can be included with normal messages in the protocol and this does not add significantly to the overhead.