Many systems in safety critical areas use replication of vital
components to improve reliability. It is important to note that such
systems also provide high availability. These to characteristics are
not the same thing:
A low level technique for increasing integrity of information on a
disk is to mirror the disk and use a simple comparator. If the data ever
differs, we then force a failure. This means that the <#696#> integrity<#696#>
of the
data is twice as good, but even if the comparator has 0 failures, a
failure of either disk means the whole system fails. Thus the system
has 1/2 the availability, at best.
In a distributed system, this is even more critical to appreciate,
since the network that connects components may be far less reliable
even than the components themselves. Network partitioning will reduce
the availability of distributed replicated objects.
A third characteristic of replicated services is that they may provide
better performance for read access, but are generally slower for update
access. To see why, we must consider how the replication is made
transparent whilst providing exactly the same view of the data to all
clients consistency must be maintained, and the mechanisms for
maintaining consistency have a cost.
This cost is simply because a replication system is usually
implemented by making all the operations that applied to a unique
object into <#697#> atomic<#697#> operations which have a set of sub-operations on
each of the replicated object. Thus locks or optimistic concurrency
control, with rollback and recovery are required with all the
overheads described in the previous section.
In some systems (such as name and directory servers) updates are far
less frequent than reads. In these services, the master/slave model of
replication is simpler. A master service holds the state which is
downloaded to slaves. Multiple slaves provide higher performance
and availability for reading. Updates only happen on the master, which
then restarts the slaves one by one with the new version.
There are other systems in which it doesn't matter if the information
is only slightly out of date: e.g. the latest weather.
Then we can afford to propagate updates lazily (i.e. as a read request
to a replicant server arrives). The design tradeoff here are similar
to those for cache designs.
Some systems allow write access to proceed even when network
partitioning means that consistency cannot be maintained. This
is so that availability is maintained.
This is a reasonable engineering tradeoff.
In this case, we need a scheme
to deal with the rejoining of the distributed system after repair.
This will be based on voting about versions of objects.
Various schemes exist for distributed voting between replicated but
possibly inconsistent servers and clients:
-
Majority Consensus
The normal majority consensus algorithm is used for clients updating
replicated servers:
-
A client has previously read some version of some item (timestamped).
-
When the client updates, it submits the new version to some server
together with the old ones.
-
This server then either broadcasts to all the others or simply chains
through them (rather like the directory service mentioned in chapter
2).
To ensure that the servers converge on an update being accepted or
rejected, there is now a voting phase, while the update is ;SPM_quot;pending;SPM_quot;:
-
The servers recursively ask the next server if this is ok (and tell
them all the servers that so far have okayed the update) -
-
If the update's timestamp is more recent than each servers stored
value, they ok it.
-
When the number of ok's in the consensus message is a majority of
the servers, the update starts to be applied.
Suitable use of timeouts ensures that we can operate this scheme even
with a minority of servers unavailable.
-
Weighted Voting
Weighted voting schemes allow for weights to be associated with the
different replicated objects, both for read and update access. Then
read operations need some quorum of votes before they proceed, while
updates need some different quorum.
In this way, we can get higher read availability but for older copies,
whilst maintaining high integrity for writing.
This is an important properties in distributed systems, since there
are often system objects which are rarely updated, but frequently read
(e.g. executables on workstation disks).