WFQ:
===============
* Published earlier than XCP.
* But conceptually comes last in the progression of adding more and more
  intelligence to routers.
* TCP Tahoe assumes routers do nothing special except that routers will
  eventually drop packets because they have to.
* XCP assumes routers can compute rich feedback and run fairness and efficiency
  controllers, but still assumes FIFO routers.
* WFQ assumes routers can actually decide the order in which packets are
  scheduled to enforce fairness on a packet-to-packet basis.
* Roughly speaking, WFQ is a more sophisticated version of round robin with
--> 1. better delay guarantees than having fixed quotas for each flow.
--> 2. the ability to handle different packet sizes fairly.

The tradeoff space:
==============
* More intelligence in the routers means more obstacles to deployment (This is changing now.).
* At the same time, more intelligence in the routers means we can expect more out of our network.
--> WFQ can provide protection from arbitrary misbehaving sources, not just misbehaving TCP or XCP sources.
--> Fairness is provided on a packet-by-packet basis (also called isolation). XCP/TCP provide it after convergence.
* What is isolation?: Roughly speaking: if you stick to your allocated bandwidth, you will not be affected by anyone else.
--> Figure 1 illustrates isolation very well.
--> Aside: The math is complicated (the authors use the term "forbidding" :)), but really the implications are more interesting.
--> What are the implications?
--> Implication 1: As long as the Telnet source is sending under its fair share, it will get really low delay.
--> Implication 2: If it exceeds its fair share, its delays will skyrocket.
--> A much more formal version of this result called the Parekh-Gallager theorem appeared in
"A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks: The Single Node Case"
and
"A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks: The Multiple Node Case"
--> The Parekh-Gallager theorm says (very  informally): "if everyone promises to stay under a
    particular transmission rate, and the network can support the sum of these transmission rates,
    and the routers run WFQ, then everyone's worst-case per-packet delay can be bounded."
--> In general, WFQ incentivizes good congestion control because a sender's bad behavior can only affect that sender.
    This is the exact opposite of FCFS/FIFO.

But, what's the catch with WFQ?
================
* Need to maintain separate queues for each flow.
* This leads to the dreaded per-flow state (at the very least to track heads and tails for
  each flow's queues). We actually need a bit more state for each flow.
* The point is it can get bad if you have many many flows.
* But, is it really so bad? Depends.
* In the core of an ISP, yes. There may be a few millions of flows.
* In a datacenter, maybe no. There might be a few thousands of flows depending on the size of the datacenter.
* And the number of active flows is even less.

The WFQ algorithm itself (partly based on Section 3 of http://web.mit.edu/6.829/www/2016/papers/fq-notes.pdf)
================
* Based on Nagle's round-robin algorithm: service each flow for a time quantum and then move on to the next.
* But round-robin has problems.
--> 1. It can be unfair if one flow uses much larger packets than the other.
--> 2. Also, if a packet arrives just after it's turn, it needs to wait a while for the time quanta of
  each of the other flows before it can be transmitted.
* How do we fix this? What's the ideal behavior?
* Idealized model: Bit-by-bit round robin, which is unattainable. Go one bit at a time from each flow.
* How do we approximate bit-by-bit round robin?
* Some definitions
--> Round: one cycle through all queues with data (backlogged queues) where we send one bit per flow in a round.
--> dr/dt = mu / N (i.e., more flows means rounds increase slowly with time.)
--> Notice that N can keep varying as the number of active/backlogged queues changes.
--> And this will change the rate of change of r.
--> Implication: tracking r accurately is hard. We'll come back to this soon.
* Let's say we have r at any point in time. Then, we have the following:
* Start time (in units of rounds) of a packet in the bit-by-bit model is either:
  --> 1. Finish time (in rounds) of previous packet in the same flow (if queue was backlogged)
  --> 2. Current round number (if queue was not backlogged)
  --> Combine the two by taking the max of 1 and 2.
* What is the finish time (in rounds) of this packet:
  --> Start time (in rounds) + length of packet. This is because we send one bit for each round.
* Need to track finish time on a per-flow basis (in addition to head and tail pointers).
* Packet-by-packet model makes two approximations relative to the bit-by-bit model:
  --> Send packet with smallest finish time to "catch up" with bit-by-bit model.
  --> Set r to finish time of current packet in service.
  --> Can prove that these don't introduce *too much* more delay in the worst-case relative to bit-by-bit. Again,
      look at the Parekh-Gallager paper if you're interested.

Historical aside:
=====================
* In 1995, an algorithm called Deficit Round Robin (a variant of Round Robin but with an important fix) was developed.
* It didn't have WFQ's delay properties, but wasn't too bad delay-wise either.
* But it was much simpler! Eventually found its way into all major routers.
* This is what is on routers today, even though WFQ is better delay-wise and came earlier.
* Meta point: Simplicity is important.