File Replication systems are really just one manifestation of
replicated persistent data. We will talk about replicated files in the
rest of this section, but the concepts are generally applicable to
other objects, including parts of files.
There are a number of techniques that are used to provide
replication, and they each have their costs and benefits.
Replication is used to provide a number of transparencies:
Replication carries with it several overheads. These include:
Fault Transparency or Tolerance - components in a distributed system
can fail independently. A file may be made more available in the face
of failures of a file server if it appears on more than one file
server. Availability is one of the main measures of the advantage of
a replication algorithm.
Performance - if a file is 'nearer' a user, access will generally
be faster. Files may be replicated across local storage devices at
creation time, or they may be 'cached' when first accessed. Such
caching may be on the basis of a few blocks of the file (NFS), the
whole file (AFS 1) or as much of the file as possible (AFS 3).
Portability - file access should be possible after a workstation is
detached from the network if portability is required.
Replication techniques to provide consistency
can be divided into two main classes:
Storage costs. Whole file copies are relatively expensive
Consistency of Updates. There is cost in coordinating the update of
a replicated file so that further accesses perceive a consistent
object. This has an associated execution and communication cost.
Complexity. Some replication schemes may require complex software to
implement. This has an associated implementation and maintenance
These schemes assume faults are rare and implement recovery schemes
to deal with inconsistency
These schemes assume faults are more common, and attempt to ensure
consistency of every access. Schemes that allow access when all copies
are not available use voting protocols to decide if enough copies are
available to proceed.