Failures

There are three classes of failure in a distributed transaction system:

The client can fail
The transaction service (including communications channel can fail)
The stable storage can fail (stable storage refers to storage such as disk or tape that persists after power outage - of course, it still has failure modes such as physical damage, but these are normally much more rare than power loss).

If the client fails, it will either be during the transaction, or after commitment. In the first case, the service simply undoes the transaction. If the transaction service fails, the client and server can wait for recovery and retry, or independently assume failure, and wait to issue aborts/undoes. If the stable storage fails, the enterprise should acquire more reliable hardware by spending more money! Cases one and two involve recovery mechanisms. We describe techniques for this next.