Some Definitions.

A <#569#> fault<#569#> is a defect in a system that MAY lead to an error, It may be permanent, transient or intermittent. It may in fact never betray itself. An <#570#> error<#570#> is a piece of information in a system that results from a fault and may cause a failure when processed in good faith. A <#571#> failure<#571#> is a deviation in the observable behavior of the system from its specification. This can include the failure to provide some service within some specified interval. The quality of a reliable system is often measured in terms of its <#572#> Mean Time Between Failure<#572#> (MTBF), its <#573#> Mean Time To Repair<#573#> (MTTR) and its <#574#> Availability<#574#>. The first is a reflection of how often something fails, the second how long it takes to become available again. The last is the percentage of time it offers the specified service. The ability to be fault tolerance can be based on many approaches. These all increase the overall availability of the distributed system despite internal faults. They all do so with some associated cost to some aspect of the performance of the system. In addition to these modes, timeliness brings another list of requirements: