So far in this chapter we have described techniques for isolating faults and
helping the distributed applications programmer avoid some of the pitfalls of
synchronization and consistency in a distributed system. We have not described
how a distributed system can <#688#> improve<#688#> availability in any way.
The starting point is to recognize that although there are more possible
independent failure modes in a distributed system than there are in a
centralized one, applications may not need many of the hosts or storage systems
to be capable of running.
Then we should that there is a high degree of replication of hardware,
communications and operating system facilities in a distributed system. Some of
these components (e.g. CSMA/CD and FDDI/DAS Local Area Network technology) are highly
reliable, and have no dependent failure modes. Other components may not be
very reliable but are inexpensive to replicate (e.g. microprocessors/small
disks/memory).
The aim of reliable distributed system software is to take advantage of the
hardware replication or reliability to place, replicate or migrate processes
and data (methods/objects) to avoid/mask failures.