How to Make Snapshot Isolation Serializable (Serializable Snapshot)


Recall that Snapshot Isolation works as follows:

(1) Reads from transaction Ti
read committed data as of the time Ti began.

(2) Writes follow the first committer wins rule:
Ti will successfully commit only if no other concurrent transaction 
Tk has already committed writes to data items where Ti has written
versions that it intends to commit.

Consider an example
where snapshot isolation yields a non-serializable execution.

initially x = 3, y = 17

T1: x := y

T2: y := x

Any serial execution would result in x and y both being 3 or both being 17.

R1(y) R2(x) W1(x) W2(y), result is that x = 17 and y = 3

Notice that there is a rw dependency from T1 to T2, because 
T1 reads an earlier version of y than produced by T2.
Similarly there is an rw dependency from T2 to T1 based on x.
Snapshot isolation does not prevent this cycle in the serialization graph.

In order to make snapshot isolation serializable is to prevent
such cycles from arising by judiciously aborting certain transactions.

Now, let us review the three types of conflict edges.

If T1 and T2 have a write conflict and they both commit with
T1 preceding T2 in the commit order, then 
T1 must commit before T2 starts.
Otherwise, T2 will abort.
So the edge T1 --ww---> T2 implies that T1 completes
before T2 begins. They are not concurrent.

Similarly, if T1 writes something that T2 reads, then 
T2 must begin after T1 commits, because T2 reads committed values
at the time it begins.
Thus T1 --wr---> T2 implies T1 completes before T2 begins.
Again, they are not concurrent.

Therefore the only conflict edges between transactions that could
involve concurrent transactions are rw edges.
These are sometimes called anti-dependencies.

Now a theorem from the paper 
Making Snapshot Isolation Serializable
by Alan Fekete University of Sydney Dimitrios Liarokapis, Elizabeth O'Neil, and 
Patrick O'Neil University of Massachusetts and Dennis Shasha

shows that the only way a cycle in the serialization graph can arise
in a snapshot isolation setting is if the cycle contains
two rw edges in a row. So the fix is that whenever this happens, abort
some transaction.


THEOREM 2.1. Suppose H is a multiversion history produced under Snapshot 
Isolation that is not serializable. Then there is at least one cycle in the 
serialization graph DSG(H), and we claim that in every cycle there are three 
consecutive transactions Ti.1,Ti.2,Ti.3 (where it is possible that Ti.1 and 
Ti.3 are the same transaction) such that Ti.1 and Ti.2 are concurrent, with an 
edge Ti.1 --rw--> Ti.2, and Ti.2 and Ti.3 
are concurrent with an edge Ti.2 --rw--> Ti.3. 


PROOF.ByAdya et al. [2000], a cycle exists in DSG(H). Take an arbitrary cycle 
in DSG(H), and let Ti.3 be the transaction in the cycle with the earliest 
commit time; let Ti.2 be the predecessor of Ti.3 in the cycle, and let Ti.1 be 
the predecessorof Ti.2 in the cycle.
Suppose for the sake of contradiction that Ti.2 and 
Ti.3 are not concurrent: then either 

(a) Ti.2 finishes before Ti.3 starts (but then 
this is before Ti.3 commits, contradicting the choice of Ti.3 as the earliest 
committed transaction in the cycle), or 

(b) Ti.3 finishes before Ti.2 starts (but 
this contradicts the presence of an edge Ti.2 --> Ti.3 in DSG(H) by Lemma 2.2.). 
Thus we have shown that Ti.2 must be concurrent with Ti.3, and therefore the 
edge from Ti.2 to Ti.3 is an anti-dependency, Ti.2 --rw--> Ti.3. Because Ti.2 and 
Ti.3 are concurrent, the start of Ti.2 precedes the commit of Ti.3 and the 
start of Ti.3 precedes the commit of Ti.2. 

Now suppose for the sake of 
contradiction that Ti.1 was not concurrent with Ti.2. Then either 

(a) the commit of 
Ti.1 precedes the start of Ti.2 (which by the last sentence of the prior 
paragraph precedes the commit of Ti.3,so the commit of Ti.1 precedes the commit 
of Ti.3 contradicting the choice of Ti.3 as earliest committed transaction in 
the cycle), or 

(b) the commit of Ti.2 precedes the start of Ti.1 (contradicting the 
presence of an edge Ti.1 --rw--> Ti.2 in DSG(H)). 
Thus we have shown that Ti.1 is 
concurrent with Ti.2, and so the edge from Ti.1 to Ti.2 must be an 
anti-dependency, Ti.1 --rw--> Ti.2.
End of Proof


The subsequent paper
https://courses.cs.washington.edu/courses/cse444/08au/544M/READING-LIST/fekete-sigmod2008.pdf

took those results and detected transactions having the characteristics
of Ti.2 above and aborted them. This eliminated any chance of deadlocks at
the cost of aborting some transactions that did not need to be aborted.
The paper showed that in practice this was not a problem.


Implementation notes:

At each variable x, if transaction T wants to read x from site s,
record the start time for T and perform the read of the value
produced by the last transaction T' that committed x on s before the
start time of T and s was up the whole time before the start of T. Note
however that if T itself wrote x, then
the read should return the value that T wrote, because a transaction
can always see its own writes.

If T comes writes at site s, then record the value written but do not allow
anyone to see it until (and if) T commits.

When an end(T) occurs, for each access of T, determine whether 
T should abort either
for available copies reasons (i.e. T accessed x on a site that later 
failed) of for Snapshot Isolation reasons (i.e. some other transaction T'
modified x after T began, T wrote x,
and T' committed before the end(T) occurred) or because
committing T would create
a cycle in the serializable graph including two rw edges in a row.

Basically, we want to maintain a serialization graph with edges
decorated with rw, ww, or wr.
Then we want to detect cycles in that graph and abort a transaction
whenever it's going to particpate in a cycle with two rw edges
in a row. If T has two edges to another transaction, we record both
of them, so T --rw--> T', T --ww---> T', and T --wr-->T' could all be possible.
Now, we have to decide how to construct this graph.

Upon end(T'),
add T --ww--> T' to the serialization graph if T commits before T'
begins, and they both write to x.

Upon end(T'), add  
T --wr-->T' to the serialization graph if T writes to x,
commits before T' begins, T' reads from x, and T' commits.

Upon end(T') 
add
T --rw --> T' to the serialization graph if T reads from x, T' writes to
x, and T begins before end(T').

Upon end(T') 
add
T' --rw --> T'' to the serialization graph if T' reads from x, T'' writes to
x, and T' begins before end(T'').

In every case above:
If committing T' causes a serialization graph cycle
having two rw edges in a row however, then don't commit T' and
remove T' and all associated
edges from  the serialization graph, otherwise commit T' and leave
it in the
serialization graph.