CONCURRENT AND DISTRIBUTED SYSTEMS LECTURE 6 B TRANSCRIPT (C) 2023 DJ Greaves, University of Cambridge, Computer Laboratory. L6-B TRANSACTIONS Now we need to move on to the next main subject, which is composite operations. We've seen so far ways for safe concurrent access to a single object. But sometimes we need atomicity in operations that span multiple objects. We'll call that a composite operation. Obviously, we could increase the set of primitives so that more complex things that we need to do are still atomic operations, but this will become infeasible. It's much better to have a small set of basic primitives and then build composite object operations on top of those. Many of our best examples are going to come from banking. Suppose we can perform three operations on an account: get the balance, credit a certain amount or debit a certain amount. Each of these individual operations will be atomic and may involve taking out locks while it takes place. Is this going to be a sufficient level of locking or are we going to need something above that, such as another layer of locks or clever system design? Let's consider two threads that want to operate on savings and current accounts. T1 will transfer £100.00 from one to the other. T2 will give us our total balance. What could possibly go wrong? If we look at the possible interleavings of these two threads, there's two possible answers for the balance. There's also the chance that we could lose £100. That's because we must always be careful about system failure and if T1 crashes between the debit and the credit, then 100 pounds has disappeared. Two different kinds of problem illustrated here. The first is what we call isolation. The individual operations being atomic is not enough. We want the credit and the debit making up the transfer to happen as one operation. As always, we could add another primitive, such as a "transfer primitive" but this is not going to be desirable of feasible for every possible sort of thing that we want to do in the future. And the second problem is fault tolerance in the real world: programs or computers fail, so we need to make sure we can recover safely. Hence, we need to introduce the concept of a TRANSACTION which you may have come across in databases or elsewhere. CLICK Let's imagine we can write in a language which has transaction as a keyword. Now we can define a transaction. We might give it some arguments, amount and so on, and the names of the accounts. But we're going to have a basic block in the body of our method that behaves atomically. This block is either going to execute completely correctly and it will commit its results, making persistent changes to the environment. Or else it will abort and no changes will have been made to the environment at all. And moreover, we should get a notification that it's aborted, ideally in the return code. If we don't get the explicit notification, then we'll recover that problem later, as we'll see. We also want this transaction to provide isolation, which we'll define in the next slide. CLICK So we come to our ACID properties. We want our committed transactions to satisfy all four of these properties. Atomicity: either all or none of the transaction's operations are performed. The programmer doesn't need to worry about cleaning up. Consistency: A transaction transforms the system from one consistent state to another. In other words, it preserves our system invariant. The programmer must ensure these by the structure of the code. For instance, for banking we need conservation of money. Money shall not be created or destroyed (unless you're the Bank of England). Isolation: Each transaction executes as though it was isolated from the concurrent effects of the others. Hence we can ignore any other concurrent transaction, even if it's partly true. And finally, durability: the effects of a committed transaction survive subsequent system failures. If the system reports success, we must ensure this is recorded permanently and persistently, such as on secondary storage. You might think this is a slightly different definition of atomicity from previously if so. That's all right. CLICK Our ACID properties divide essentially into two categories. Atomicity and durability deal with making sure the system is safe even across failures. "A" means no partially complete transactions and "D" means transactions previously reported as committed don't disappear even after a system crash. Consistency and isolation, on the other hand, ensure the correct behaviour even in the face of concurrency. Consistency means that we can always code as though the invariants are in place. Unless, of course, we've just locally broken them in the body of our transaction and we'll restore them before we finish. And isolation means that concurrently executing transactions are indivisible: any other concurrent actions in the system are completely hidden from us. We'll drill down on that in the next slide. CLICK To achieve isolation, the most obvious and brute force approach is to have a global system wide lock. We lock the whole system while we make our transaction and we unlock it afterwards. Now that would give us what we want. Perfect isolation. But of course it also prohibits any concurrency. Instead, we have serialisation through a global shared lock: always a bad design. Also, just having a global lock isn't sufficient to enable us to voluntarily abort a transaction halfway through, which is a desirable feature for the system to provide. If we weren't able to credit the amount to be, then we would have to stop. Actually, in reality, not being able to credit an account is probably rarer than not being able to debit it, owing to overdraft limitations and so on. Although not being able to credit an account does happen sometimes, certain educational institutions in Boston have run into the problem that they can't credit their account because it's reached the maximum legal limit. Well, they're taking out the global lock around a transaction for isolation purposes is not practical because of the concurrency restrictions it would incur. It's a very useful paradigm for us to bear in mind as programmers. HERE Much like how when we were writing the code in the monitor, we were able to assume that it was a critical section in our coding style. Well, it's nice if we can assume perfect isolation when we write the body of a transaction. Note, these two transactions can be run in either order and we get the same result. But that's not directly related to what serializable means when we define it shortly. Serializable also applies to (the effects of) transactions which when run in different orders might produce different results. I've just repeated that up front because it's often a source of confusion. CLICK If the transaction keyword doesn't do anything magical, there would be 6 possible interleavings of the four atomic operations in these two transactions, and they will potentially have differing effects. We'll now define the term serializable. When we've run these two transactions, we have a return value and we have mutable effects on the environment. We say that the outcome (the return value and the mutable effects) was a serializable outcome, arising from these two transactions, if it reflects some order of running these transactions one after the other. Not all orderings have to give the same result, but there must nominally exist a potential ordering of the transactions that gives the results that we observe in the real system. We'll look at that in a bit more detail next. Here's one serialisation of the atomic operations of our two transactions. T1 ran fully before T2, so the actual execution was SERIAL and not surprisingly, the results of this are serializable. But this second example is not a serial execution. We have an interleaving of T1 and T2 where the result is still serializable. If we look carefully, we see the results are identical to serially executing T2 and then T1. This schedule of operations is therefore serializable. [And shortly we'll see this is serializable because we have only swapped the execution orders of non-conflicting operations.] All of the operations that T1 performed on an object were after T2 had last touched that object. So we had genuine concurrency, but we had a serializable result. CLICK This, on the other hand, shows a non-serial execution that is not serializable. T1 sees the old S and the new C and it will get the right answer for the total balance. Our second interleaving here also gets the wrong answer. One bad isolation pattern ommits the amount transfered in the total and the other returns twice that amount in the total. Both orderings have swapped to conflicting operations such that there is no matching serial execution. To explain what we've observed. We will define a conflicting operation more formally shortly. There's quite a lot of ways that we can analyse code to workout whether an interleaving would result in a serializable execution or not. And some of these can be quite tricky in practice, especially when we're using arrays and we need to work out whether to addresses or subscripts are equal or definitely not equal. Ultimately, it's limited by decidability and the halting problem, and that sort of thing, from the Computation Theory course. And not surprisingly, there's a vast research literature on this subject. Note that the leaf operations could be anything in general: operations on trees and other data structures; one might delete something the other is working on!. Two might overwrite the same value and we would want to ensure a 'last-writer wins' rule, even though the decision on who was the last writer might be made post hoc. CLICK So here we'll just look at conflict serializability analysis. We have a schedule, or interleaving, of operations called S. We want to see if it matches any serialisation out of a set called T. We can look at conflicts and whether any conflicts have been permuted. If all the conflicts in S have the same relative ordering as all the conflicts in one of the possible T's, then we're OK. And what do we mean by a conflict? Well, the example definition we'll use is of non-commutative behaviour between two operations. For instance, reading and writing a variable are not commutative. If we read it after we've written it, we get a different result than by writing it first. [In general that is: a more sophisticated analysis could check whether the write happened to store the value that was already there!] And the same goes for an element in an array and so on. Two reads at the same location can be freely swapped over, but two writes may not because the last right takes priority. We can create a history graph to describe the way a set of concurrent transactions have meshed. We could potentially do this offline before we run, but by and large, transactions will have a dynamic interleaving, so we need to generate these online. We'll have nodes that represent the operations on the shared variables and will have two types of arc. This is set out so the horizontal arcs simply sequence the operations within one transaction. That ordering is static. It doesn't get affected by interleaving with other transactions. For offline analysis, there may be conditional control flow within a transaction, so it would not be a line. It could be a tree potentially, or even a cyclic loop structure. That would not be 'history graph' it would be a 'future graph' ! So it is linear for any specific execution trace under dynamic analysis, which is all we consider here. The second type of arc that we'll add on the history graph connects together operations on a common resource. This will trace out a line and it will show the order that the operations on an item took place in. We don't need this line where a pair of operations don't conflict. For instance, if they're both reads. Or if they are just increments and decrements of a non-saturating (no overdraft limit) value. Non-conflicting operations can be commuted with the same overall effect. These relative operational orderings, of course, will vary from one run of the program to another, according to how the interleaving actually occurred. Here are the history graphs for our two good schedules that we considered before. We can now ignore the left or right arrows; we need to look carefully at the up and down arrows. In the upper plot here, each up/down arrow goes from T1 to T2. This is fine. This is tantamount to saying that T1 effectively fully ran before T2. Although of course there could well have been overlap as T1 may still have been running while T2 started, but it didn't do anything conflicting. And on the lower history graph here, we see pretty much the same situation. Both arrows go upwards. So although T1 and T2 were interleaved to some extent, that resultant effect is going to be the same as though T2 ran fully before T1. CLICK On the other hand, if we look at the history graphs for our two schedules that didn't work, we can see a pattern here. We have a mixture of arrows going up and down. And in a sense, we have a cycle. It's not really a cycle if you look at the left/right arrows in the individual thread history, so for this. consideration we treat T1 and T2 as atomic nodes in themselves and we see that we have arrows in both directions between them. In general, of course, we'll have more than two threads, and all we're saying is, can we arrange these in an order where there are no cycles between the nodes that represent the threads? Where we are unable to fully order the threads then we have failed at conflict serializability analysis. It's possible that another form of analysis can show that this interleaving was indeed serializable because we've abstracted the details of the behaviour into a simple concept of a conflicting operation. Certain patterns of behaviour might cancel each other out and we could end up with something that is acceptable. Any form of abstract model is going to potentially oversimplify, and the real behaviour might have been OK. But this was a very simple example and we can stare at the code ourselves and be convinced that there is no resolution to this problem and it's about serialising. CLICK Here's the earlier slide again, and now you can draw the conflict arrows on it yourself. In the first one, T1 sees the old S and the new C, and in the second one T1 sees the new S and the old C. So just to summarise, the isolation preservation system or the transaction processing system needs to monitor these conflicts. It needs to make sure that no transactions that have suffered a conflict are allowed to commit. We'll see sometimes that some of the transactions can commit and others need to be aborted. Later we'll consider transaction systems might make an intelligent choice about which ones to commit and which ones to abort so as to optimise throughput (or owner revenue!). CLICK In the first half of this lecture, we looked at alternatives to shared memory. Although some of these perhaps look a little bit esoteric, it's worth bearing in mind that when we come to look at distributed systems, we won't have shared memory anyway. And when we move up to higher levels to consider transaction processing systems, we find a whole new need for atomicity. We looked at what it means for two transactions to be effectively isolated, IE that they have a serializable effect on the world. The world should experience that they happened in some particular order. The world doesn't see them getting mixed up with each other. And we looked at history graphs and some limited analysis of these to determine whether a sequence of operations could reflect a serializable behaviour. Next time we'll look more on transactions and consider isolation in more detail. We'll define strict isolation and look at situations where we need to abort a transaction. Recall that when it's aborted, it's as though nothing happened at all for that transaction. We'll look at three designs of transaction system, one using two-phase locking and rollback, one using timestamp ordering and one using optimistic concurrency control. END