A model of failures gives a handle for how to provide fault
transparency. At some level of a system, a fault occurs. If we provide
a mechanism to mask it, we can make our system operate correctly. If
we do not, the fault may result in a failure.
Failures are usually categorised as follows:
Each type fault has an appropriate method for hiding it so that the
corresponding failure does not occur. Of course, each method will have
an associated cost.
A crash is a where a system ceases to run the programs that are
A system that fails ``silent'', is one that crashes and does not report its
fault, and may even return valid (but meaningless) responses to
A ;SPM_quot;Fail Stop;SPM_quot; system is one that gives guarantee that it will give no
response when it crashes.
A commission fault is one where a system silently fails to update
state of some variable in stable storage (say disk)), but appears to
A value fault is said to have occurred after some
sequence of updates where the state in stable storage is not what was
A timing fault is where a system fails to meet some deadline.