How to Make Snapshot Isolation Serializable (Serializable Snapshot) Recall that Snapshot Isolation works as follows: (1) Reads from transaction Ti read committed data as of the time Ti began. (2) Writes follow the first committer wins rule: Ti will successfully commit only if no other concurrent transaction Tk has already committed writes to data items where Ti has written versions that it intends to commit. Consider an example where snapshot isolation yields a non-serializable execution. initially x = 3, y = 17 T1: x := y T2: y := x Any serial execution would result in x and y both being 3 or both being 17. R1(y) R2(x) W1(x) W2(y), result is that x = 17 and y = 3 Notice that there is a rw dependency from T1 to T2, because T1 reads an earlier version of y than produced by T2. Similarly there is an rw dependency from T2 to T1 based on x. Snapshot isolation does not prevent this cycle in the serialization graph. In order to make snapshot isolation serializable is to prevent such cycles from arising by judiciously aborting certain transactions. Now, let us review the three types of conflict edges. If T1 and T2 have a write conflict and they both commit with T1 preceding T2 in the commit order, then T1 must commit before T2 starts. Otherwise, T2 will abort. So the edge T1 --ww---> T2 implies that T1 completes before T2 begins. They are not concurrent. Similarly, if T1 writes something that T2 reads, then T2 must begin after T1 commits, because T2 reads committed values at the time it begins. Thus T1 --wr---> T2 implies T1 completes before T2 begins. Again, they are not concurrent. Therefore the only conflict edges between transactions that could involve concurrent transactions are rw edges. These are sometimes called anti-dependencies. Now a theorem from the paper Making Snapshot Isolation Serializable by Alan Fekete University of Sydney Dimitrios Liarokapis, Elizabeth O'Neil, and Patrick O'Neil University of Massachusetts and Dennis Shasha shows that the only way a cycle in the serialization graph can arise in a snapshot isolation setting is if the cycle contains two rw edges in a row. So the fix is that whenever this happens, abort some transaction. THEOREM 2.1. Suppose H is a multiversion history produced under Snapshot Isolation that is not serializable. Then there is at least one cycle in the serialization graph DSG(H), and we claim that in every cycle there are three consecutive transactions Ti.1,Ti.2,Ti.3 (where it is possible that Ti.1 and Ti.3 are the same transaction) such that Ti.1 and Ti.2 are concurrent, with an edge Ti.1 --rw--> Ti.2, and Ti.2 and Ti.3 are concurrent with an edge Ti.2 --rw--> Ti.3. PROOF.ByAdya et al. [2000], a cycle exists in DSG(H). Take an arbitrary cycle in DSG(H), and let Ti.3 be the transaction in the cycle with the earliest commit time; let Ti.2 be the predecessor of Ti.3 in the cycle, and let Ti.1 be the predecessorof Ti.2 in the cycle. Suppose for the sake of contradiction that Ti.2 and Ti.3 are not concurrent: then either (a) Ti.2 finishes before Ti.3 starts (but then this is before Ti.3 commits, contradicting the choice of Ti.3 as the earliest committed transaction in the cycle), or (b) Ti.3 finishes before Ti.2 starts (but this contradicts the presence of an edge Ti.2 --> Ti.3 in DSG(H) by Lemma 2.2.). Thus we have shown that Ti.2 must be concurrent with Ti.3, and therefore the edge from Ti.2 to Ti.3 is an anti-dependency, Ti.2 --rw--> Ti.3. Because Ti.2 and Ti.3 are concurrent, the start of Ti.2 precedes the commit of Ti.3 and the start of Ti.3 precedes the commit of Ti.2. Now suppose for the sake of contradiction that Ti.1 was not concurrent with Ti.2. Then either (a) the commit of Ti.1 precedes the start of Ti.2 (which by the last sentence of the prior paragraph precedes the commit of Ti.3,so the commit of Ti.1 precedes the commit of Ti.3 contradicting the choice of Ti.3 as earliest committed transaction in the cycle), or (b) the commit of Ti.2 precedes the start of Ti.1 (contradicting the presence of an edge Ti.1 --rw--> Ti.2 in DSG(H)). Thus we have shown that Ti.1 is concurrent with Ti.2, and so the edge from Ti.1 to Ti.2 must be an anti-dependency, Ti.1 --rw--> Ti.2. End of Proof The subsequent paper https://courses.cs.washington.edu/courses/cse444/08au/544M/READING-LIST/fekete-sigmod2008.pdf took those results and detected transactions having the characteristics of Ti.2 above and aborted them. This eliminated any chance of deadlocks at the cost of aborting some transactions that did not need to be aborted. The paper showed that in practice this was not a problem. Implementation notes: At each variable x, if transaction T wants to read x from site s, record the start time for T and perform the read of the value produced by the last transaction T' that committed x on s before the start time of T and s was up the whole time before the start of T. Note however that if T itself wrote x, then the read should return the value that T wrote, because a transaction can always see its own writes. If T comes writes at site s, then record the value written but do not allow anyone to see it until (and if) T commits. When an end(T) occurs, for each access of T, determine whether T should abort either for available copies reasons (i.e. T accessed x on a site that later failed) of for Snapshot Isolation reasons (i.e. some other transaction T' modified x after T began, T wrote x, and T' committed before the end(T) occurred) or because committing T would create a cycle in the serializable graph including two rw edges in a row. Basically, we want to maintain a serialization graph with edges decorated with rw, ww, or wr. Then we want to detect cycles in that graph and abort a transaction whenever it's going to particpate in a cycle with two rw edges in a row. If T has two edges to another transaction, we record both of them, so T --rw--> T', T --ww---> T', and T --wr-->T' could all be possible. Now, we have to decide how to construct this graph. Upon end(T'), add T --ww--> T' to the serialization graph if T commits before T' begins, and they both write to x. Upon end(T'), add T --wr-->T' to the serialization graph if T writes to x, commits before T' begins, T' reads from x, and T' commits. Upon end(T') add T --rw --> T' to the serialization graph if T reads from x, T' writes to x, and T begins before end(T'). Upon end(T') add T' --rw --> T'' to the serialization graph if T' reads from x, T'' writes to x, and T' begins before end(T''). In every case above: If committing T' causes a serialization graph cycle having two rw edges in a row however, then don't commit T' and remove T' and all associated edges from the serialization graph, otherwise commit T' and leave it in the serialization graph.