Congestion Avoidance and Control =================================== * Published in 1988 * Among the most influential papers in networking * Likely the first *implementation* of a congestion-control algorithm * Built on ideas introduced by the DECbit work at Digital Equipment Corporation (DEC), but made them practical. * Likely the most important reason for its success: it was actually implemented in the Berkeley Software Distribution variant of UNIX, version 4.3. * Aside: One variant of BSD 4.3 was called Tahoe, which lent its name to the first version of TCP congestion control its name: TCP Tahoe. So TCP Tahoe is the name of the algorithm described in the paper, even though the paper doesn't say as much. * Other critical reason for its success: timeliness. The Internet was in bad shape and desperately needed a solution The one big conceptual idea =================================== * The articulation of the packet conservation principle: "A new packet isn't put into the network until an old packet leaves." as a fundamental axiom for congestion control. (We'll see in the next paper how it isn't really that fundamental after all.) Interesting algorithmic ideas =================================== * A new method for retransmission timeout estimation using rtt variance * Exponential backoff when a retransmitted packet is lost * Slow start * Congestion avoidance * Work was seminal ... --> These ideas are now taught in every networking class. --> Spurred the development of many better congestion-control algorithms. * ... and impactful: --> Between 1988 and ~2004, almost every Internet-connected computer in the world ran a slightly modified version of this algorithm (called TCP NewReno). --> Since 2004, other congestion control algorithms have popped up (e.g., BIC/Cubic on Linux, Compound on Windows). The problem it was fixing =================================== * Congestion collapse: the network is delivering packets at close to its capacity, but many of these packets are retransmissions of previous packets. * Another definition of congestion collapse: the network is doing work (delivering packets), but not all of it is useful (duplicates). * The older version of TCP (RFC 793 TCP) would dump an entire large window of packets onto the network in one shot, overflowing the buffers and causing persistent retransmissions. * Figure 3 shows what this old behavior looked like. Fixing this large-window problem =================================== * Slow start: Need to determine the right window, which depends on how many other sender-receiver pairs there are. * Easiest way to determine a large number is binary search, which is what slow start does. * Start from 1, increase by 1 on every ACK => exponential increase over time. * Stop when you detect a packet loss. * How do you detect a packet loss? Implicitly, through a timeout. Fast-transmit (mentioned but not discussed in the paper) fixed this using data-driven loss recovery: two cumulative ACKs with the same sequence number indicates something was lost. * Aside: Why were ACKs in TCP cumulative? Retransmission timers =================================== * Need to estimate when to timeout and retransmit. * Too late and you hold up delivery of data. Why? Because TCP provides an in-order reliable bytestream. * Too early and you risk congestion collapse. * Part of RFC 793 TCP's problem: incorrect timeout estimation using a constant factor beta over mean RTT. * This paper introduced timeout estimation using variance in addition to mean. * Two positive consequences: --> Retransmission timeout tracks the actual RTT samples a bit closer because it is adaptive (compare Figure 6 and 5). --> Better tracking means retransmission timeout has higher chance of being *just right*--- neither so large that it holds up packets or so small that it causes duplicates. Exponential backoff =================================== * If retransmitted packet is also lost, what does it mean? * It means you could be WAY off. * Easiest fix. Keep doubling retransmission timeout until you get an ACK back. When you get an ACK reset the timeout estimation logic. * In practice, there is an upper bound on how large the retransmission timeout can be (typically 60 seconds). Congestion avoidance =================================== * Gentler adjustments to window in steady state * Increase by 1 every RTT (additive increase). In practice, increased by 1/cwnd on every ACK to keep it more gradual. * Decrease by factor of 2 on every loss (i.e., timeout) (multiplicative decrease). * Technical aside: congestion avoidance was borrowed from DECbit. DECbit detected congestion by setting a bit, while Jacobson relied on packet losses---a more universally and easily deployable solution. Why Additive Increase Multiplicative Decrease (AIMD)? =================================== * What about the other three combinations? * Paper says why multiplicative decrease is important: Delays increase exponentially during congestion, need to be countered by draining out packets exponentially (recall: multiplicative decrease on one packet is exponential decrease over time). * Additive increase is barely motivated. * A much cleaner visual way of thinking about it is using Chiu-Jain plots. Gateway congestion control ================================== * Paper hints at one problem with purely end-to-end congestion control. * Fairness is hard to guarantee. Even now TCP only guarantees fairness only at the multi-second time scale, i.e., averaged over multiple seconds two flows sharing a single bottleneck link will get half the link each. * But TCP doesn't do so well on fairness at < 1 second granularity. * In some contexts (datacenters), flows don't even last a few ms! * Intelligence in the network can help: XCP and WFQ provide progressively more sophisticated algorithms that incorporate in-network intelligence. In some ways, WFQ is optimal. Will talk about this next class. Other details in the paper ================================== * Optimizing RTT mean and variance calculation using integer arithmetic instead of floating point. * Likely very important in 1988 with very constrained processors. * Might be important even now in some regimes (e.g., datacenters): processors are much faster than 1988 but so are link speeds. * More experiments in the appendix: Figure 10 and 11 are especially worth understanding deeply. * Figure 7 is now a stable of every networking paper now: the dumbbell topology Overall summary of the paper ================================== * A paper that had both academic and real-world impact. * Timely solution, well thought out, and implemented in an actual OS. This is where TCP Tahoe succeeded where DECbit didn't (DECbit was implemented in a simulator). Another story: DECbit was standardized as ECN in the late 1990s and eventually used in DCTCP in 2010. * Meta research message: the importance of being timely and proposing simple solutions. * But, not without its drawbacks --> No real exploration of parameter space --> Limited experiments by today's standards (although it was probably far more than the standards of 1988). --> Many handwaving references to control theory and math so far (turns out the theory of congestion control is still quite open). --> Footnotes as long as the paper itself!