SUPERVISION 2 - Transactions

Transaction-processing systems provide a more programmer-interface to
concurrency by masking its effects for snippets of code known as
"transaction".  The guarantees of transactional systems are described by the
ACID properties: atomicity, consistency, isolation, and durability.
Transaction systems shift the burden of managing concurrency away from the end
programmer into databases, language runtimes, and operating systems -- and by
masking its effects, allow flexibility in how the ACID properties are
implemented.


Question 1. ACID properties

  (a) In the context of multi-threading, "atomicity" meant something different
      than in transaction systems.  If we simply associate locks with objects
      in a transaction system to implement 2-phase locking (2PL), which ACID
      properties fall out naturally, and which require additional work?  Why?

  (b) Our definition of "isolation" is grounded in "serialisability": the
      appearance of concurrently executing transactions within a system to
      conform to serial execution.  In lecture, we discussed "conflict
      serialisability", in which a schedule is considered serialisable if two
      transactions operating on overlapping sets of objects order all
      conflicting (non-commutative) operations the same way.

      For the following two transactions, enumerate all interleavings under
      the assumption that individual operations on objects are atomic.  For
      each, is it a serial execution?  A conflict-serialisable execution?

      transaction T1 {
        a = A.getBalance();
        b = B.getBalance();
        return (a + b);
      }

      transaction T2 {
        A.debit(100);
        B.credit(100);
      }

  (c) In transaction systems, "atomicity" refers to transactions being
      committed fully or not at all.  Why might it be easier to implement
      transaction atomicity with optimistic concurrency control than with
      2PL?

  (d) One limitation of 2PL is that it can suffer "cascading abort", as it
      implements "isolation" rather than "strict isolation".  In the above
      transactions, B.creditt(100) may abort due to insufficient funds;
      illustrate a 2PL schedule in which aborting T2 triggers a cascading
      abort of T1.  Why would strict 2PL have prevented this problem?

      NB: This question has been modified from the original; "A.debit(100)"
      has been replaced with "B.credit(100)", and T1/T2 have been swapped.

  (e) "Durability" ensures that if a transaction is reported as committed to
      the submitter, the results will persist across a variety of failure
      modes including power loss.  However, the obvious solution of simply
      writing out object updates to disk before returning success may
      encounter problems with atomicity -- under what circumstances might
      this occur?

  (f) One technique for implementing durability in transaction systems is
      write-ahead logging.  In this scheme, a balance is sought between a
      pure log-based store (in which all object accesses are stored as an
      append-only log of changes, infinitely long) and simple on-disk object
      stores in which all stores are performed "in place" (such as simple
      inode-based file systems).  What effect does varying the maximum log
      size have on:

      (i) Efficient use of persistant storage space?

      (ii) Transaction throughput?

      (iii) Recovery time following a crash?

  (g) Write-ahead logging uses atomicity and isolation properties of single-
      sector writes to build atomicity and durability properties for
      transactions.

      (i) What might go wrong if single-sector write was not atomic?

      (i) What might go wrong if disks did not properly serialise stores --
          e.g., did not implement a "happens-before" relation between
          sequential synchronous disk writes?


Question 2: 2-phase locking

  2-phase locking is one scheme to implement isolation in the presence of
  concurrent transaction processing on multiple objects.  It consists of two
  monotonic phases: lock expansion, and lock contraction.  For this question,
  we will work with the following transactions:

  transaction T1 {
    x = A.read();
    B.write(x);
  }

  transaction T2 {
    x = B.read();
    C.write(x);
  }

  transaction T3 {
    x = C.read();
    A.write(x);
  }

  Imagine that the transaction scheduler, in an effort to offer fairness to
  transactions, interleaves scheduling with lines of code in a round-robin
  manner.  For example, it would, without the intervention of locking, two
  transactions as follows:

    (First epoch)
    T1.L1
    T2.L1

    (Second epoch)
    T1.L2
    T2.L2

    ...

  If a transaction is blocked on a lock, then the scheduler will skip that
  transaction and proceed to the next transaction in the epoch.

  (a) If we use naive 2PL to implement transactions T1, T2, and T3, what
      schedule will be selected?

  (b) Deadlock presents a serious challenge to 2-phase locking in the presence
      of composite operations.  Imagine that the transaction system implements
      a simple deadlock detector in which, if there are in-flight transactions
      but none can be scheduled, then all in-flight transactions will be
      aborted (and their locks released) to restart in the next epoch.  What
      schedule arises with the above three transactions?

  (c) Livelock can also be a challenge for 2PL in the presence of a naive
      deadlock resolution scheme.  How might the scheme in (b) be modified to
      provide guarantees of progress in the event that a deadlock is detected?


Question 3: Timestamp ordering (TSO) and optimistic concurrency control (OCC)

  Timestamp ordering allows transactions to operate concurrently based on a
  serialisation selected as transactions start.  Each transaction is issued a
  timestamp (sequence or ticket number) that be checked against object
  timestamps to detect conflicts, and to taint objects written or read to
  trigger detection of conflicts with other transactions that will touch the
  same object later.

  (a) Timestamp ordering is conservative, in that it is committed to a
      particular serialisation of a transaction attempt at the moment that the
      timestamp is selected, causing it to potentially reject other valid
      serialisations.  Give an example of two transactions, timestamp
      assignments, and an interleaved schedule in which a valid serialisation
      is rejected by TSO, leading to an unnecessary abort.

  (b) TSO aborts transactions as conflicting operations are detected.  Why,
      informally, might this lead us to believe that TSO could suffer from
      livelock under contention?

  Optimistic concurrency control likewise allows transactions to operate
  concurrently; unlike TSO, it "searches" for a valid serialisation on
  transaction commit, rather than at transaction start.  This is argued to
  give OCC greater flexibility in avoiding unnecessary aborts seen in TSO,
  which might exclude valid serialisations.

  (c) Give an example of two transations and a schedule that would be accepted
      by OCC, but rejected by TSO.

  (d) OCC executes transactions against local copies of data ("shadows")
      rather than globally visible original copies, which avoids the need to
      explicitly handle cascading aborts.  However, copying all objects at
      the start of the transaction is problematic if the set of objects to be
      operated on is determined as part of the transaction itself.  Why does
      on-demand copying of objects complicate transaction validation?

  (e) OCC aborts transactions as conflicting commits are detected.  Why,
      informally, might we argue that this conflict behaviour is more
      resistant to whole-system livelock than TSO?

  (f) How could OCC's validation model lead to starvation; describe a scenario
      in which a set of transactions and a serialisation cause starvation for
      a transaction.