Chapter 23 Distributed computations

Exercises

23.1 How is the distributed deadlock detection algorithm of Section 18.10 affected by failure of the nodes involved?

The process causes a chain of resource dependencies to be followed from node to node. The process is terminated by return to the initiating node. The failure of any node (or any communications link between the nodes involved where there is no redundancy) causes the process to hang until the node is restarted. If a node has crashed it can no longer request resources and the resources it holds should be freed. The algorithm outlined should be extended to take account of failure detection to be of any practical use. In this chapter we see examples where deadlock can be broken by simpler means, for example if two quora are being assembled concurrently the participants may detect this and allow the assembler with the earlier timestamt to win, causing the later one to back off and free its locked objects.

23.2 Suppose a file server supports file caching at clients. Discuss how cache coherency might be maintained. Include possible interactions among the clients after server failure. What restart procedures would you expect at the server and at clients?

First note that a cached file, unlike a replica, is a temporary copy (what might be called a hint in other contexts). Let us ignore replicas and assume that there is a single "first class" copy of a file held by the file service. In Chapter 6 we considered stateless servers where all cache managers could do was to ask for the version number at the server before the cached file was used. We considered changes being "written through" to the primary copy at the file server, thus invalidating other cached copies. A stateful server might notify the cache managers of these out of date copies.

A stateful server might also offer a lock service with exclusive or shared locks on whole files or sub-sequences of files. If the server crashes it has to recover its state on restart by interrogating its clients. If a client fails while holding an exclusive lock, this must be detected (perhaps by a heartbeat protocol among the system nodes) and the lock must be freed. On restart the client has to recover a version of the file that is consistent with others'. If some local updates have not been written through to the server they may have to be discarded or if no updates have been made at the server in the meantime they can be committed now. Handling these issues properly requires the transaction semantics discussed throughout Part III.

In Chapter 23 we have seen higher level protocols which could be used for collaboration between cache managers. For example, one of the distributed mutual exclusion protocols of Section 23.5 could be used to allow one cached copy at a time to be updated and the changes propagated without reference to the file service - or the file service could be sent the updates along with all the holders of cached copies. The failure semantics are as discussed in the section and in the questions below.

In general, higher levels of software may themselves be managing the consistency of their data, employing atomic commitment of updates (Section 22.8) with inbuilt failure handling. The storage service should not enforce any policy that higher levels do not require or need, for example MRSW locks.

23.3 Discuss object coherency and failure semantics for a distributed object mapping system as described in Section 6.8.2.

As in Question 23.2, let us assume a single first-class replica and the possibility that multiple copies of an object are mapped into the main memory of processes. The issues discussed above are relevant to the management of these cached copies; the persistent store may offer exclusive or shared locks; failure of lock-holders must be detected. Alternatively, as above, the managers of the cached objects may cooperate using a distributed mutual exclusion protocol.

In the case of objects, as opposed to byte-sequence files, the logical structure can be made clear and the whole object need not be written through if one part of it changes in length.

23.4 Discuss how the nodes executing an atomic commitment protocol after write-quorum assembly should handle an incoming JOIN or LEAVE message from a process which is not in the write quorum.

Published algorithms tend not to cover dynamic group size.
First assume hierarchical group structure where the JOIN and LEAVE messages are sent to the coordinator, here acting as commit manager. A LEAVE message from a process not in the quorum does not affect the 2PC protocol. The read and write quora are still correct if the group size decreases by one (QW>n/2, QW+QR>n). After the 2PC completes the LEAVE protocol must be run to inform the group of the identity and location of the member to be removed.
A JOIN message can also be deferred until the 2PC completes; the process group are working with a known size and known read and write quorum sizes. After the 2PC completes the JOIN protocol is run to add the new member's identity and location. In this case QW and QR have to be redefined if the conditions QW>n/2, QW+QR>n are no longer satisfied.
With flat (or peer) group structure the LEAVE and JOIN messages are sent to all processes including the 2PC coordinator. Here the LEAVE and JOIN protocols are more complex. A scheme can be sketched where no process adds or removes the sending process until all have acknowledged to all and those that are part of the quorum must defer acknowledgement until after the 2PC completes.

23.5 For the BULLY and RING election algorithms:

How many messages are involved in electing a new co-ordinator?

What should happen when a failed co-ordinator restarts?

BULLY: Because on average n/2 processes must start to run an election, and an election involves sending messages to on average n/2 processes, awaiting replies, etc. the total number of messages involved is O(n^2).
RING: In the worst case, n-1 processes may start off a ELECT(ID) concurrently. If all of these messages circulate then each process will add its ID to n messages and send the message on, again giving O(n^2). But after one message completes the circuit (n sends) a COORD message is sent to all processes and the concurrently circulating ELECT messages can be discarded.
Compared with BULLY, a process on receiving ELECT just passes it on, after adding its own ID), and does not start its own election. In BULLY higher-ID receivers must start their own elections.

When a failed coordinator restarts it will send COORD(ID) and other processes will defer if their IDs are lower. If all processes must ACK to the sender before the coordinator takes over, any process running some protocol can defer its ACK until the protocol completes.

23.6 Give examples of applications which you would you expect to require strong consistency of data and of those which you would expect to prefer or tolerate weak consistency.

There is a tradeoff of consistency versus availability. If we are considering replicas, the reasons for replicating (load balancing, bottleneck avoidance, failure tolerance, geographic distribution for rapid communication) may point to weak consistency being most desirable - else we are back to all replicas being loaded, bottlenecks, distant and failure-prone.
Naming data, for name-to-location mapping, must be highly available and being out of date is not too disastrous. e.g. the client find the location doesn't work and retries later, by which time the update has reached its local replica.
Financial data (banking, stock trading) may be replicated because of the application's inherent distribution e.g. London, New York, Tokyo ... and strong consistency may be required for correctness of the persistent data. For example, in a transaction scenario where money is transferred from one location to another the money must not be lost or double-credited.

23.7 For the n-process mutual exclusion algorithms:

(a) Is each algorithm correct and free from deadlock?

(b) How many messages are required for the entry and exit protocols, with and without failure detection?

(c) How many processes does each process need to communicate with?

(d) How easy is it for the group of processes to be reconfigured by JOIN and LEAVE?

(a)Informal reasoning indicates these published algorithms are correct and free from deadlock. But we only have an outline of them here rather than a full, detailed specification with all assumptions stated in detail. A formal correctness proof would be needed to answer with confidence. Also, assumptions should be stated, e.g. about how many processes may fail, that processes do not cheat etc.

(b)
Centralised: 2 for entry protocol, one for exit protocol.
With ACKs: 3 or 2 for entry protocol, two for exit protocol.
Token ring: n messages for token to circulate but it circulates when no-one wants to enter its CR. Protocols for detecting whether the token is lost and for regenerating it without risking two tokens are O(n). Failure detection to keep the ring intact requires an ACK of every token-passing message.
Distributed: 2(n-1) for entry protocol, O(n) for exit protocol (but depends on likely contention). Again, the basic protocol involves waiting for a process to exit its CR and reply, which requires ACKs to distinguish use of the resource from node failure.

(c) Centralised: one - the coordinator.
Token ring: two - those before and after in the ring. But a process must also participate in protocols to regenerate the token (with n-1 processes).
Distributed:n-1

(d) In all cases the group need to update their lists of other members' IDs and locations.
Centralised: This does not rely on group protocols. If the coordinator leaves a new one must be elected. All members need to know the coordinator.
Token ring: The ring of processes must be redefined when processes leave or join.
Distributed: This relies on group protocols and it is crucial that all participants atomically change their list of group members.