Class 18
CS 439
21 March 2013

On the board
------------

1. Last time
2. Revisit locking

---------------------------------------------------------------------------

1. Last time

    I/O and disks. 

    thanks Parth

    some people wonder why we study the geometry. answer is that it
    affects how systems are built.

2. Revisit locking

    A. Recall game plan for managing concurrency:

        --build a lock/unlock primitive with hardware support

            if there's one CPU, we implement lock/unlock as
            disable/enable

            if there are multiple CPUs, we use spinlocks

                we saw one type of spinlock last time

                today, study another one: MCS locks.
            

        --then we build higher-level abstractions from the low-level
        lock/unlock:

            mutexes
            monitors/CVs

    B. Review disadvantages of locking

        --hard to get right (though the advice we give you helps)
        --performance
        --performance/complexity trade-off
        --starvation
        ....

    C. Focus on performance. Use it as an excuse to cover some things.

	quick digression:

	    --_dance hall_ architecture: any CPU can "dance with" any
	    memory equally (equally slowly)

	    --NUMA (non-uniform memory access): each CPU has fast access
	    to some "close" memory; slower to access memory that is
	    further away
		--AMD Opterons like this
		--Intel CPUs moving toward this
		--see first page of handout

	    --two further choices: cache coherent or not. in the former
	    case, hardware runs a cache coherence (cc) protocol to
	    invalidate caches when a local change happens. in the latter
	    case, it does not. former case is far more common.

	let's assume ccNUMA machines...back to performance issues....

	the performance issues are:
	
	(i) fairness 
	    --one CPU gets lock because the memory holding the
	    "locked" variable is closer to that CPU
	    --allegedly, Google had fairness problems on Opterons (I
	    have no proof of this)

	(ii) lots of traffic over memory bus: if lots of contention for
	     lock, then cache coherence protocol creates lots of remote
	     invalidations every time someone tries to do a lock acquisition 

	(iii) cache line bounces (same reason as (ii))

	(iv) locking inherently reduces concurrency

        mitigation of (iv): more fine-grained locking

	mitigation of (i)--(iii): better locks

	    --MCS locks

		--see handout

		--advantages

		    --guarantees FIFO ordering of lock acquisitions
		    (addresses (i))

		    --spins on local variable only (addresses (ii), (iii))

		    --[not discussing this, but: works equally well on
		      machines with and without coherent caches]

		--NOTE: with fewer cores, spinlocks are better. why?