Replication

Many systems in safety critical areas use replication of vital components to improve reliability. It is important to note that such systems also provide high availability. These to characteristics are not the same thing: A low level technique for increasing integrity of information on a disk is to mirror the disk and use a simple comparator. If the data ever differs, we then force a failure. This means that the <#696#> integrity<#696#> of the data is twice as good, but even if the comparator has 0 failures, a failure of either disk means the whole system fails. Thus the system has 1/2 the availability, at best. In a distributed system, this is even more critical to appreciate, since the network that connects components may be far less reliable even than the components themselves. Network partitioning will reduce the availability of distributed replicated objects. A third characteristic of replicated services is that they may provide better performance for read access, but are generally slower for update access. To see why, we must consider how the replication is made transparent whilst providing exactly the same view of the data to all clients consistency must be maintained, and the mechanisms for maintaining consistency have a cost. This cost is simply because a replication system is usually implemented by making all the operations that applied to a unique object into <#697#> atomic<#697#> operations which have a set of sub-operations on each of the replicated object. Thus locks or optimistic concurrency control, with rollback and recovery are required with all the overheads described in the previous section. In some systems (such as name and directory servers) updates are far less frequent than reads. In these services, the master/slave model of replication is simpler. A master service holds the state which is downloaded to slaves. Multiple slaves provide higher performance and availability for reading. Updates only happen on the master, which then restarts the slaves one by one with the new version. There are other systems in which it doesn't matter if the information is only slightly out of date: e.g. the latest weather. Then we can afford to propagate updates lazily (i.e. as a read request to a replicant server arrives). The design tradeoff here are similar to those for cache designs. Some systems allow write access to proceed even when network partitioning means that consistency cannot be maintained. This is so that availability is maintained. This is a reasonable engineering tradeoff. In this case, we need a scheme to deal with the rejoining of the distributed system after repair. This will be based on voting about versions of objects. Various schemes exist for distributed voting between replicated but possibly inconsistent servers and clients:
  1. Majority Consensus The normal majority consensus algorithm is used for clients updating replicated servers:
    • A client has previously read some version of some item (timestamped).
    • When the client updates, it submits the new version to some server together with the old ones.
    • This server then either broadcasts to all the others or simply chains through them (rather like the directory service mentioned in chapter 2).
    To ensure that the servers converge on an update being accepted or rejected, there is now a voting phase, while the update is ;SPM_quot;pending;SPM_quot;:
    • The servers recursively ask the next server if this is ok (and tell them all the servers that so far have okayed the update) -
    • If the update's timestamp is more recent than each servers stored value, they ok it.
    • When the number of ok's in the consensus message is a majority of the servers, the update starts to be applied.
    Suitable use of timeouts ensures that we can operate this scheme even with a minority of servers unavailable.
  2. Weighted Voting Weighted voting schemes allow for weights to be associated with the different replicated objects, both for read and update access. Then read operations need some quorum of votes before they proceed, while updates need some different quorum. In this way, we can get higher read availability but for older copies, whilst maintaining high integrity for writing. This is an important properties in distributed systems, since there are often system objects which are rarely updated, but frequently read (e.g. executables on workstation disks).