CONCURRENT AND DISTRIBUTED SYSTEMS LECTURE 7 TRANSCRIPT (C) 2023 DJ Greaves, University of Cambridge, Computer Laboratory. Concurrent Systems Lecture 7: Isolation vs. Strict Isolation, 2-Phase Locking (2PL), Time Stamp Ordering (TSO), and Optimistic Concurrency Control (OCC) L7P03: Last time: isolation – serialisability Last time we looked at some alternatives to using shared variables for providing concurrency and communication such as active objects and explicit message passing in Occam. Then we moved on to look at making sequences of operations atomic. For this, we define the notion of the transaction and recap the ACID properties. We mentioned that we needed isolation, which is the concept that a transaction occurs in isolation from any other, either before or after. But in reality, we appreciate that the system will run many transactions concurrently and we do find that we want the required effect of that concurrent overlapping to be as though it was some serial ordering of the actual transactions. And if we have that property, then we say that the effect was serializable. We saw that one way of determining whether the consequences of running several transactions at once were serializable is to look at a history graph of the operations that each transaction makes on the individual mutable variables. Of course, in general, it might not just be variables that are being operated on, it could be larger objects of various sorts or files on the file system, and so on. The idea of executing transactions serially, one after the other is a useful model, but in reality we do want to run the transactions concurrently, but the result should be as though they ran serially. L7P04: Isolation – serialisability We considered these two transactions T1 and T2 which operate on 2 shared resources. There's a total of 6 possible interleavings of these four operations on the two resources. Here are two possible interleavings that don't work. In the first one, transaction one sees the old value of S, but the new value of C, so that's not serial nor serializable. And in the second one here, T1 sees inconsistent values. Again, it sees the new S and the old C. So both of these interleavings are not valid. In a transaction processing system, the end effect must be equivalent to some serial execution of the transactions. The effects of a bad schedule are invalid and they cannot be committed or returned to the end user. But the loss of performance arising from rigorously forcing strict isolation can be too severe as well. So we need to look at approaches which sit in the space in between. We'll draw the distinction between strict and non-strict isolation. L7P05: THIS TIME If we end up with a bad schedule, a bad interleaving, then some of the work will have to be thrown away, but maybe we won't have to throw all of it away. Ultimately, we're going to consider 3 approaches in this middle ground: two-phase locking with rollback, timestamp ordering and optimistic concurrency control. These are all different ways in which the transaction system can be implemented to provide the ACID guarantees. These will each make a different trade off between the amount of work we have to repeat and the level of concurrency that we achieve, and different systems will be appropriate in different circumstances, according to, for instance, how frequently data is changed. And also how commonly it occurs that two transactions which run concurrently do interfere with each other. L7P06: Effects of bad schedules Let's consider what is the effect of a bad schedule. Suppose we have two transactions running at once, which are logically supposed to be running one after the other. Suppose T1 updates (ie writes) an object, but then this is overwritten by T2. Well, this is what we call a write-after-write (WaW) conflict and the T1 update will of course be lost. We can also have a DIRTY READ. This is where T1 reads an object that has been updated by an uncommitted transaction T2. Well, that might be right if T1 is going to commit first and logically come first in the overall serialisation. But what if T2 doesn't commit? If T2 aborts, then T1 has read information that nominally doesn't even exist. That's called a read-after-write conflict, sometimes written as RaW. We're going to see shortly that aborts might have to be more often than we might otherwise want. And there's a third conflict. This is where T1 reads an object, which is then updated by T2. Well, that's going to be fine if T2 is going to end up after T1 in the logical serialisation. But supposing T1 reads that value again, the way it happens to be coded: well, it will get a different value, and that's not going to make sense for any serialisation. This is called a write-after-read (WaR) conflict, also known as an anti-dependence. If T1 is going to read that value again, then it's said to be anti-dependent on T2, (ie it's dependent on T2 not changing it before it's finally read it for the last time). For simple scalars we could code our reads to store such data in local variables in side the transaction code, but this is not suitable for enormous arrays or other collection types that our leaf operations might be performed on. There's also nominally the idea of a read-after-read conflict, but these do not conflict of course, so RaR conflicts do not exist. But all of these other conflicts lead to a lack of isolation and cannot be allowed. L7P07: Isolation and strict isolation Ultimately, we need to avoid these conflicts, and there are two ways of achieving it. One is strict isolation and the other one is non-strict isolation. With strict isolation, we stop these problems happening in the first place. With non-strict isolation we allow them to happen potentially, but we'll have to put checks in place and if they do occur then we'll have to take corrective action. With the strict approach, there's the potential problem of loss of concurrency and hence loss of performance. Whereas with the non-strict approach we behave optimistically and hope that the corrective action happens sufficiently rarely that we get a good performance gain on average. We looked before at the use of conflict graphs to see whether two transactions interfere with each other. This can be pre-computed statically at compile time by code analysis, or at least an efficient digest of the code structure can be constructed so that rapid determination is possible at run time and therefore we can enforce strict isolation statically or nearly statically with the coding for a certain faster transactions. But those which are highly complex aren't going to work very well this way. This means that non-strict isolation is very commonly used, and we'll be looking at techniques for that in the rest of this lecture. But the consequences of non-strict isolation are not limited to just one transaction. I said that we occasionally have to repeat a transaction, but what if we have to repeat a transaction that was run concurrently with another transaction and which altered the behaviour of that second transaction? Well, that second transaction is also going to have to be rerun. So if the first transaction aborts then the second one is going to have to be aborted as well. This is what we call a CASCADING ABORT and the cascade chain can be any length up to the level of concurrency that we achieved. But with either strict or non-strict isolation, the overall essence and the observed behaviour is that each transaction appeared to operate in perfect isolation. L7P07: Enforcing isolation Let's look at three ways of enforcing isolation. Two-phase locking as you can guess from its name, involves taking out locks on the objects that we wish to manipulate before making any manipulations. Timestamp ordering is going to be a way of checking whether the transactions did occur in a particular temporal order. And optimistic concurrency control is going to be based much more on the probability of clashes being low, so we can proceed a bit more cavalier like. Each of these techniques is described in the 2003 book by Bacon and Harris of this department, but you'll also find plenty of online resources these days. We've looked at locks in the earlier part of the course and we assume there will be such a lock for every object that we're going to access as part of a transaction. Reads are, generally speaking, more common than writes, and so a multiple reader, single writer lock is going to help us with efficiency here. L7P08: Two-phase locking (2PL) Two-phase locking uses 2 phases. The expanding phase during which locks are acquired but none are released and then the shrinking phase during which locks are released and no further locks are acquired. The operations on the objects can be performed in either phase, provided we have the lock for that object at that point. Two phase locking can be used either with strict or non-strict isolation and it guarantees serializable execution. Here's a concrete example. It's the section of code called transaction. In this subroutine there's an IF statement whose body is a piece of straight line code. If we look carefully at this, we'll see that all of the locks are acquired before any of them are released. This gives us our expanding phase followed by our shrinking phase. L7P09: 2PL example Before we worry about what this piece of code does, let's just do a quick static check that all of the locks which are acquired are eventually released, regardless of what route control flow route execution happens to take through this code. Well, there's only one IF statement. It has two hands and both of them do indeed have that property. Also, we note that the last thing that this code does in either branch of the if is called the TRY_COMMIT primitive. The first thing the code does is takes out a read lock or a reader lock on A. It is then going to be able to call the get balance method of A. And we'll be able to test the value that's returned. If it's greater than our target amount, then we'll take out a write lock on A, so that's going to be an upgrade of A from read lock towrite lock. We're using a set of primitives we've not quite seen before, where a read lock can be upgraded to a write lock. We see it has two unlock operations, WRITE_UNLOCK, and a READ_UNLOCK. WRITE_UNLOCK might then restore the lock to a read state. READ_UNLOCK might restore it to unlock completely unlocked. Maybe they are just separate mutexes rather than an upgradable lock: the details don't matter too much. That TRY_COMMIT primitive takes a boolean argument. If false, we request an ABORT which will roll back and the return code to any caller will be false. If true, provided a proper serialisation is achieved, a COMMIT will take place and the the value returned to the caller will be true, but it may still abort instead. These TRY_COMMITs are essentially the same thing as RETURN statements, but we've decorated them with concrete syntax here. Although in fact a RETURN statement inside a method defined with the TRANSACTION keyword might typically automatically apply the commit semantics to the return statement. We'll go on to study precisely what needs to be done by that commit return. One other thing to observe with this code is that it's structured hard-and-fast into an expanding phase and a shrinking phase. But within those two phases we try to acquire locks as late as possible in the expanding phase and unlock them as early as possible in the shrinking phase: this is going to help maximise concurrency. L7P11: Problems with 2PL Obviously, there's quite a lot of locking going on here. If we had to manually code this, it could be pretty bug prone, although a checker might check it for us, or indeed if we can check it then we might as well automatically insert the locks with some sort of source-to-source translation on the source code, or it could just be built into our high-level language anyway. There's also, of course, the risk of deadlock, but we've seen that we can avoid deadlock by imposing an ordering on the locks and make sure that we take them out in that order. Although again, that could lead to a loss of concurrency opportunities if the order that we have to take them out is not optimal for the control. That we locally have where we're dynamically determining which lock to take out next. So perhaps we should then post sort the global order to minimise the consequences of the global order denying US concurrency. But as we'll shortly see, we will be able to abort a transaction and rely on the built in rollback mechanisms. So if there was a lock that we should have taken out before, one that we have taken out, we can abort and go back and do it again, taking out that lock. And if we knew that was likely to happen on a regular basis, we would have taken out the lock in the first place. When we're writing code for two-phase locking, we can structure it either so that we get strict isolation or so that possibilities for non-strict behaviour are in there. If you look back at this previous slide code, you'll see that the very first thing we do in the shrinking phase is unlock B. But during the shrinking phase, we go on to mutate A. We're going to add some interest to A. So supposing that thread T1 has just updated B and now it's entered its shrinking phase and it's going to unlock the write lock on B, then another thread T2 can come along and can make an access to B. It can read it, write it, or otherwise access it, but it will be operating on the uncommitted. value of B because transaction T1 hasn't yet committed yet. The printed hard copy may say T1 updates A, but it's better to write B for this example. So now T2's fate is tied to T1. T2 has operated on data updated by T1, but T1 has not yet committed. So if T1 aborts, then the data that T2 operated on was dirty, stale data and T2 cannot be allowed to continue processing. It too must abort. Hence we have a CASCADING ABORT. Our transaction processing system will keep track of all these dependencies and implement both the cascade and the rollback mechanisms still to be discussed. But let's look at how we can code this same example in a strict way. L7P12: Strict(er) 2PL example Here's one way to make this code stricter. What we have done is we have taken operations out of the shrinking phase and instead put all of the unlocks towards the end. Our coding paradigm now is to take out locks in the expanding phase as we need them, but not to release any until we have completed all operations on shared data and then to release all of them. The UNLOCK primitive is going to release the lock regardless of whether it was a read or a write lock. As we hold locks for longer, we're going to have increased contention for the locks. We'll get less parallelism in the system, but we should get fewer aborts from isolation failure. And the most extreme coding style, of course, is to take out all the locks that we might possibly need before we do anything at all and release them all at the very end. After we've done everything. L7P13: 2PL: rollback Now recall that a transaction may ABORT. By abort we mean stop its progress and make none of the changes that it's made apparent to anything else. We've just seen that if we notice a violation of isolation, then we should abort, but the transaction may also choose to abort for many other reasons. Now with two-phase locking, the isolation is going to work. But what should we do about aborting? After all, we are going to be directly modifying the mutable values. This is where ROLLBACK comes in. Rollback is the process of undoing the actions that an aborting transaction may have performed already. Remember the letter D from the asset properties D is durability. A transaction can abort anytime up to its commit. It then becomes durable and cannot be aborted. Of course, at a higher level we can issue a correcting transaction that has the effect of reversing the situation, but that's above the transaction processing system. That's what happens in banking, for instance, or accountancy in general, where something's been done wrong: we don't undo what was done. Instead we issue a check or something in the reverse direction to correct the problem. L7P14: Why might a transaction abort? Why might a transaction abort? Some failures are internal to the transaction system. We've seen if transaction T2 depends on T1, and if T1 aborts, then T2 must be aborted. Also, if we detected deadlock, then we need to abort one of the participants in the deadlocking cycle. Or we could have other systems level problems such as a crash or running out of memory and so on. On the other hand, we can also see a transaction self-abort. For instance, if we're trying to move money, but the source account doesn't have sufficient capital, then the debit operation on the source account will fail and therefore we need to abort the whole transaction. In a two-phase locking example just now we saw the use of return code so that the programmer is aware whether the transaction has progressed successfully or whether it has failed. But at a lower level, we can also have abort rollback and retry. This is especially going to be the case for the optimistic concurrency systems, and that won't necessarily be visible to the application-level programmer. L7P15: Implementing rollback: undo How can we implement transaction rollback? There are various mechanisms in general and in real-world systems, probably some hybrid of the basic techniques is going to be used. The first technique is a simple UNDO. Undo requires that we maintain a log. The log contains a list of operations that we have completed as part of the transaction. But if the transaction needs to abort and undo, then we just reverse each operation in the reverse order that they happened. So un-doing the most recent one first. Undo in reverse order is generally possible but other orders may be invalid (eg shoes and socks on and off, eg database record additions under foreign key integrity constraint). This of course requires that the operations are undoable. Sending an e-mail cannot easily be undone, but putting an e-mail in an output buffer ready to be sent can be undone. We must make sure our log has sufficient information for the undo. For instance, if we are performing a credit of an account a with a value X, then the obvious undo is a debit to the same account with the same amount. But simply storing the arguments to an operation might not be sufficient to make it reversible. If we set the balance of A to the explicit value X, then the arguments to that operator do not contain sufficient information to undo the operation. We must also store in the log explicit information about how to undo it, which would in this case be the previous value of the account. And there's also another consideration that we'll look at in the next lecture, which is how do we write atomically to the log? Or what if the logging mechanism should go wrong? L7P16: Implementing rollback: copy The second main approach to implementing rollback is the copy. We work on a copy of the data. Then if we need to abort our operations, then we just don't commit the copy. The old version that remains the version that others see. In detail, there's two basic approaches to this. One involves having a pointer, which can be atomically written, which gives the address of the active copy, so that commit operation is essentially to update that pointer to the new version. The other approach is the more brute force technique of taking a snapshot of the original data and if we want to abort copying that snapshot back to where it was originally held. But it could be appropriate to use block copies when we're modifying a larger data structure, such as a record with many fields such as the hour, minute, second day of the week, month and so on. And of course, there are all sorts of hybrid systems, such as copy-on-write and partial copying and so on, as you can well imagine. L7P17: Timestamp ordering (TSO) Two-phase locking and strict two-phase locking are widely used in practise. But they can limit concurrency, especially the strict form. And also they must be able to deal with deadlock. Timestamp ordering (TSO) is an alternative approach. With TSO, each time we start a transaction we get a unique timestamp allocated from a central clock, a bit like taking the ticket in Lamport's Bakery Algorithm. Every object now needs to be augmented to carry a timestamp for when it was last accessed. Or it could have two timestamps, as we'll see one for read and one for write. But the basic mechanism works with just one timestamp updated on every access, regardless of read or write. Timestamps form a complete ordering. We serialise the side effects in that complete ordering. It's pretty straightforward. When a transaction makes access to an object, it needs to compare the transaction timestamp V of T with the object's timestamp V of O. If it's in the wrong order, then we abort. The good order is the obvious one, which is the object was last accessed before this transaction started. But if it's been accessed by a later transaction, then potentially there's going to be a non-serializable interleaving and we just abort our transaction. We don't necessarily have to tell the application programmer that we've aborted. We can keep the arguments passed into the transaction and just retry it. Normally, the programmer won't need to know, but sometimes they might be interested for performance monitoring or so on, in which case they might have the option to be told how many times the transaction was retried or for them to manually implement the retry loop. Thinking of this in terms of history/conflict graphs, we see that if we can draw a diagonal line across our graph such that no edges cross it. Then everything was either before or after that diagonal line, and we also need to make sure the diagonal lines don't cross each other. In the jargon, this is called a "structural happens-before" partition. L7P18: TSO Concrete Example 1 Work through a particular example. I don't think there's much point me reading it all out. But you can work it through for yourself and see that with the interleaving considered here, both transactions can happily commit under TSO. L7P19: TSO Concrete Example 2 Here's the previous worked example. Again, the difference this time is with steps four and five. They are in the other order from the previous slide. Hence we're going to see a structural conflict. In this interleaving T1 and T2 both start in the same order as before T1, then T2, but step ILLEGIBLE 4 two starts to get ahead of T1 and T1 sees a timestamp that's two new, and it will have to abort. So after T2 has committed, T1 can be restarted again and it will get a fresh timestamp which is greater than the timestamp on all of the relevant objects. L7P20: Advantages of TSO TSO has several advantages over two-phase locking: it's deadlock free, it allows more concurrency and it doesn't have so much overhead handling locks. But on the other hand, every object needs to carry a timestamp. We can improve TSO by distinguishing reads and writes in the timestamps held on the objects. Let's put a little subroutine around each read and write operation. We'll call that a transactor, and this will update the read and write timestamps appropriately. For the read transaction, the first thing we need to do is check whether the timestamp on our transaction is less than the write timestamp on the object to be read. If it is, then we can proceed with the read and we update the read timestamp to be the maximum of what it currently was and the current transactions timestamp. For the write transactor, we first of all need to check whether the transaction timestamp is less than the read timestamp on the object, and if it is then we need to abort because we'll have a read after write hazard. A later transaction has already read the value that we want to change, so we can't change it. We then need to check whether our transaction time is less than the last write time for the object. If it is, then OK. Our write is unimportant. The value that we're storing has logically been overwritten by a later transaction, so we don't change anything, we just return. But if both those conditions are OK, then we can update the object and update its timestamp to be the transaction timestamp. L4P21: However ... So TSO needs a rollback mechanism, just like two-phase locking. It doesn't provide strict isolation, so it can be subject to cascading aborts. If we want to implement strict TSO, we can augment it with locks. When we first access an object and check that its timestamp is OK, we can then take out a lock on it. This will reduce the amount of concurrency from then onwards and make things stricter. But of course that might deadlock. We can still abort if we can detect the deadlock and use the normal transaction rollback as our recovery mechanism. TSO is still quite conservative. It, a priori (at the start of things), decides on a particular serialisation and that might not be the best serialisation, especially given transactions that might vary in their behaviour from one run to another. And TSO does not perform well under contention: we will repeatedly have transactions aborting and retrying. But it is a good choice for distributed systems which don't have any centralised management and where conflicts are rare, so that the chances of an abort are small. L7P22: Optimistic concurrency control Strictly speaking, TSO is a form of optimistic concurrency control because it's optimistic. It hopes that there won't be a conflict, and if there is, then it will retry. But in this course, we're going to use the phrase OCC for a slightly more optimistic approach. That form of OCC is highly suitable for distributed systems and for systems implemented on web services, such as booking flights and hotels and so on. There could be quite a lot of customers browsing information, but the rate of change is comparatively lower. They won't be committing a transaction particularly often. Each customer or client starts a transaction and it operates on a a copy of the data called a shadow copy. And it's only when we come to commit that we check whether that shadow was still valid. But if something had changed, the underlying data, something else had committed first, then we'll have to abort. We might have several shadows. We might have one for aircraft flights and one for hotel rooms and so on. At commit we'll need to check that they are all mutually consistent with each other and that nothing has updated any of them. The system has no deadlock and it has no cascading aborts, so that's good. And rollback comes essentially for free. We just discard the shadows which have been locally updated. And a further advantage comes as we shall see if we can batch up some number of commits and try to submit them for committing all at once. The committing code can look at the contending commits, analyse their conflicts and see which ones are exclusive with which others and choose an appropriate set to commit, with the remainder being aborted. L7P23: Implementing OCC (1) Let's look at implementing optimistic concurrency control. All of the objects that are going to be operated on are tagged with some sort of version or generation number or timestamp. For instance, we might store the so-called VALIDATION TIMESTAMP of the transaction which most recently wrote its updates to this object. We could store this information in the object, or possibly we could store it inside the validator as an auxiliary data structure, but normally it's with the object. There are many potential customers or threads that wish to process that data, but we're going to assume the rate of commits of transactions is quite low. When we wish to read any object, as I said before, we shall take a shadow copy and take a note of the version number of timestamp for each shadow that's involved in the transaction that we want to commit. And when we want to write, we will edit the shadows, perhaps updating them as hidden data in an HTML web form or something like that. That would be the case if we're booking a multi part holiday. So we will end up with local copies of various bits of perhaps several databases, and we'll want to ultimately commit them all at once. When it is commit time, we send all of our edited copies to a validator. The validator might be single-threaded or it might be multi-threaded or it might even be a distributed system itself. Also, preferably, the validator might operate on a batch of submissions all at once, but maybe there's just one. Maybe it's a simple system where there's only one submission considered at a time. But where it has got a batch it will try to find an optimal non conflicting subset of them to commit. And then the ones that it decides not to commit because of conflicts and so on, they will have to be retried. It will request that the "next level up" retries these. L7P24: Implementing OCC (2) Optimistic concurrency control. That phrase really covers the whole spectrum of possible approaches which follow the same basic technique. And within that there are various efficient schemes for making the individual shadow copies, such as copy on write or so on (which . is just copying selected parts of what's going to be updated). Ultimately, everything comes down to performing various cheques at commit time, and that's where all the complexity resides. There are two sorts of check that we really need to make. The first check is called read validation and that's that all of the timestamps that were taken on all the shadows that we used did concurrently hold at some particular time. And that will become our tentative start time for the transaction. A reasonable tentative start time that we might use is the greatest of all of the time stored against all of the shadows that we are using. The second validation is just for serializability, which is essentially that there aren't any conflicts with any committee transactions which have a later start time. That would tend to happen when somebody was quite slow booking their holiday, whereas somebody speedy came along after we started and booked their whole holiday and committed it while the dawdling person is still looking at data that's a bit out-of-date now: trying to book an aspect of a holiday that's no longer available or something like that. And if we have a batched up committer, then it might choose a serialisation amongst many possible serialisations so that we're committing as many as possible of the transactions out of the batch, and rejecting as few as possible. We might be giving preference to those which were rejected under this scheme in the last batch or something like that. And of course, note that the validator itself when it's committing transactions over multiple databases, it might require a decent throughput, so it might be multi-threaded, in which case just for the commit phase it will have to use one of the other previously discussed concurrency control mechanisms like TSO or 2PL as it does this commit of transactions that were accumulated under the optimistic concurrency control scheme. But in a siqmple system, the validator operates on only one database, is fully integrated with it and is the only writer! So it need not be complicated. L7P25: OCC Example (1) Let's look at an example. There are various data structures involved. They could all be in the validator, for instance. For each object, knowledge is needed about what version it's on, and it needs to keep track of which transactions have committed and which ones are outstanding. Here we see a bit of the log for T5 and T6. They are history: their writeback is done. T5 was deemed to have completed at timestamp 10 and it updated objects A, B and C so their version number will currently be 10. Similarly, T6 has also written back it completed at timestamp 11 and it updated D, so D will have 11 as its version number. Now T7 has started and we don't really care when it took its copies at this point. What we're saying is that it started its writeback and it we're going to write back with validation timestamp 12. That's what we decided to use. It's going to update objects A & E to their new values and record them being on version 12. Recall, our fundamental reason for using transactions is that we do not have lower-level atomic operations over all combinations of the different objects involved (although some might share the same database or locks so we could have it for some combinations). It is the role of this OCC protocol to provide it. Assume A & E are held on different databases and cannot be updated under a common lock. A might be flights and E might be hotels. Further suppose at this instant, midway through the writes, we have updated A but not E. And let's suppose B is skiing equipment hire or something like that. Our transaction T8 comes along and it's just booking a hotel and equipment but no flight, so it needs objects B & E. What's going to happen? So given that set up, T8 comes along and it's going to update B & E. So first of all it's going to take shadow copies of B & E and these will have their timestamps associated with them. B was last updated with timestamp 10 and E with time Stamp 9. When T8 has finished updating its shadow copies, it submits them for validation. As we said, there are two phases to the validation. They both have to succeed. The first one is the read validation. We need to check that the shadows are all part of a consistent snapshot. For the objects that have been read, get their timestamps, then check, is there a recorded validation timestamp for any updater of those objects that falls between these timestamps? So we look at the latest committed start time which was 11 because transaction 7 hasn't committed fully yet it's on 12. So the latest commit start time is 11 for transaction 6 and we look at the 10 for the B and the 9 for the E and they're both before that. That's OK. This is a basis for serialisation. So now we need to check for serializability. We checked T8 against all the later transactions, which are still ongoing, writing back their sub components, individually committing those operations. And there's just T7 here. Oh, and we do see a conflict T7 updates E, But T8 has read the old E so now we have one of these structural conflicts. L7P27: Issues with OCC So let's review what we've seen with optimistic concurrency control. Well, we just used a simple validator in this small work through and it might abort even when we don't need to. In general, it might be able to find a better start time that enables a full serialisation to exist. Compared with TSO, where TSO took its timestamp at the start of the transaction and had to stick with it. Here we're doing it post hoc: we're able to dynamically choose an effective transaction time. And also to consider a greater number of schedules to see if any of them serialise OK. However, optimistic concurrency control is not suitable when there are high conflict rates. When it performs lots of work with stale data, this will end up being wasteful. And starvation is even possible if the same conflicting set keeps coming up and has to be retried. Although I did remark previously that we could take this into account and give sets that have been stalled or transactions that have been starved in a previous iteration higher priority and choose them in preference for committing this time round. But if we do have some really slow members of society who are continually basing their bids, as it were, on out-of-date data, then they're likely to lose out every time. And they may not make progress. L7P27: Isolation & Concurrency: Summary A quick summary. Two-phase locking (2PL) explicitly locks all the items as required and then it releases them. It guarantees a serializable schedule and it can be done strictly, in which case we avoid the cascading aborts. It can limit concurrency of course, because it's a strict and it can be prone to deadlock unless we can order our lock taking in the appropriate way. Timestamp ordering (TSO) assigns the time stamps when the transaction starts. This won't deadlock, but it may miss serializable schedules. And it's highly suitable for distributed and decentralised systems. Optimistic concurrency control (OCC) executes with shadow copies of data and then it validates at commit time. This gives us lots of concurrency and compared with TSO it gives us a choice over possible serialisations. Again, there's no deadlock, but there is potential live lock when contention is high. Overall, we see these are design points in a space which is a trade off between optimism and levels of concurrency, and we also have to watch out for starvation, live lock and deadlock. These ideas will recur in the second-half of the course on distributed systems with Doctor X. L7P29: Summary & Next Time Overall, we looked at how history graphs can help us determine whether we have the structural conflict with a good or a bad schedule. We looked at the definition of isolation and compared that with strict isolation and looked at various techniques for enforcing isolation. In particular, we looked at two-phase locking with rollback, timestamp ordering, again with rollback occasionally, and optimistic concurrency control which we can get the rollback more cheaply because we only operated on copies of the data. Next time we'll look at the durability aspects, crash recovery and logs, and how to make those logs reliable against crashing. And if time permits, we'll have a look at transactional memory. END