1. In the context of the VL2 paper, what does the term "agility" mean?

Ans: Agility is the ability to freely assign any physical server machine in the
datacenter to any service, regardless of the location of the server machine. To
achieve this, we need to be able to maintain the same application address of a
service even if the service's VM is migrated to a different physical server
with a different locator address.

2. What does the term "bufferbloat" mean?

Ans: Extremely long per-packet latencies caused by extremely long queues caused
by very large buffers in many networks (e.g., cellular and WiFi), now that
memory is cheap.

3. Which of these 2 protocols consumes more router memory for its
implementation and why: weighted fair queueing (WFQ) or explicit control
protocol (XCP)?

Ans: WFQ because it maintains per-flow state on the router. XCP puts per-flow
state in packets.

4. Does it make sense for an ISP to route traffic originating from the ISP's
customers to one of the ISP's peer's customers? Does it make sense for an ISP
to route traffic originating from the ISP's provider to one of the ISP's peer's
customers? Say why or why not in each case.

Ans: Yes, because it makes money from its customers and pays nothing to its
peer. No, because it pays money to its provider and makes nothing off its peer.

5. Explain how you can implement multihoming using the border gateway protocol.

Ans: Advertise 2 paths to the same IP prefix with different AS paths. In one of
the AS paths, artifically add ASes to make the path look worse than the other
on the metric of AS path length.

6. Why does VL2 perform load balancing at the level of flows instead of
packets? What is the drawback of performing load balancing at the level of
flows instead of packets?

Ans: Packet level load balancing can reorder packets, leading to
retransmissions from TCP and reductions in congestion window. Flow level load
balancing can make the load balancing asymmetric because 2 long flows may end
up on the same link.

7. BBR uses a windowed minimum for RTT estimation and a windowed maximum for
bandwidth estimation? Why is it a minimum in the first case and a maximum in
the second case?

Ans: RTT is at least minimum RTT + queueing delay. Throughput is at most the
bandwidth. Hence, the best estimates of minimum RTT require taking a minimum
over the RTTs and the best estimates of bandwidth require taking  a maximum
over bandwidth estimates.

8. What is the packet conservation principle? How can it be violated by a
congestion control algorithm?

Ans: Put a packet in only when another packet leaves the network. This is
violated by retransmitting a packet when the original version of the packet is
still stuck in queues and hasn't been dropped yet.

9. Describe the phenomenon of congestion collapse that led to the development
of congestion control in the congestion avoidance and control paper.

Ans: The workload to the network (number of flows) is increasing, the
bottleneck link is fully occupied with transmitting packets, yet many of these
packets are duplicates of each other. This means that the effective utility of
the network (measured by the number of unique packets delivered per second) is
quite low even though the throughput is high.

10. What are the Gao-Rexford conditions for stable Internet routing without
global coordination?

Ans: Prefer a customer's route over a peer's route. Prefer a peer's route over
a provider's route.

11. If you are given a k-port switch, how many hosts can you support using a
2-layer leaf-spine topology with full bisection bandwidth? Show your work.

Ans: k**2/2 k switches at the leaf layer, k/2 at the spine layer. All leaf
layer switches host k/2 servers each. The remaining ports on the leaf layer
switches are connected to all of the spine layer switches.

12. The WFQ paper discusses how different definitions of fairness all have
their limitations. Let us consider two definitions of fairness, one in which
each user (unique source address) is allocated an equal share of the link's
capacity and one in which each flow (unique combination of source and
destination addresses and ports) is allocated an equal share of the link's
capacity. Describe how each definition of fairness can be abused.

Ans: Per-user fairness can be abused by one entity taking over many hosts and
getting arbitrarily large fractions of the link's capacity. Per-flow fairness
can be abused by opening many connections with the same source port and source
address, but different destination ports and/or addresses.

13. How does DCTCP address the incast problem that causes loss-based congestion
control to perform poorly?

Ans: By marking packets early to prevent queues from building up rapidly when
multiple flows converge on the switch at once. This way the incast senders slow
down much more in advance than they would if we used loss-based congestion
control.

14. What are the 3 different approaches to handling controller failures in
Ethane?

Ans: Cold-standby, warm-standby, and fully replicated. See 3.5 of the Ethane
paper for more details.

15. Consider a variant of the congestion avoidance and control protocol, where
instead of an additive increase and multiplicative decrease, we perform a
multiplicative increase and additive decrease. Would such a congestion-control
algorithm perform well? Why or why not?

Ans: No. The multiplicative increase is too aggressive to be combated by an
additive decrease. This would rapidly lead to a situation with very large
windows for each sender and a congestion collapse situation triggered by
spurious retransmissions.

16. Let's say you are a website operator and you suddenly find yourself under a
distributed denial-of-service attack from many clients trying to access the
website at the same time. How could you use BGP to prevent your webserver from
being overwhelmed by more traffic than it can handle?

Ans: You can use BGP to advertise a different AS level path for your IP
address. This AS level path would redirect incoming traffic to your IP address
to a "protector" service (who you pay), who could scrub the traffic and send
you only legitimate looking traffic.

17. Why does the first packet of every new flow in Ethane have to be sent to
the Ethane controller?

Ans: There are no forwarding entries by default in Ethane and Ethane only
contains forwarding entries for active flows. The first packet of a flow is
sent to the Ethane controller to find the path for packets of that flow. Then
the Ethaen controller installs forwarding entries in the switches along that
path.

18. How does DCTCP implement its marking of packets during times of congestion?
What is the recommended threshold for marking packets in DCTCP?

Ans: Marking is implemented using the RED algorithm with the upper and lower
marking thresholds both set to K. K's recommended value is the 1/7 of the
bandwidth-delay product.

19. How does DCTCP estimate the fraction of marked packets from a sequence of
marked/unmarked packets? How is this fraction then used during the window
decrease process of the congestion-control algorithm?

Ans: By measuring the number of marked packets divided by the total number of
packets in the last window of data (let's call F). Then DCTCP applies a moving
average filter on F to estimate the average fraction alpha. This alpha is then
used to decrease the window proportional to alpha.

20. Assume that an ALU within the RMT architecture is capable of implementing
the 2 operations pkt.f1 + pkt.f2 and pkt.f1 - pkt.f2. Here, pkt represents a
packet and f1 and f2 are packet headers. How many pipeline stages would you
need to implement the operation pkt.f1 + pkt.f2 + pkt.f3 - pkt.f4. What
operations would run in each pipeline stage?

Ans: First stage does f1+f2 and f3-f4 in parallel. The results are written into
header tmp1 and tmp2. Then the second stage adds tmp1 and tmp2.

21. What is an Internet Exchange Point (IXP)? In the context of IXPs, what
problem is SDX attempting to solve?

Ans: An IXP is a physical location (typically a switch) where multiple ISPs can
connect together for the purpose of peering. SDX tries to make it much easier
to express peering policies between ISPs located at an IXP.

22. What is application-specific peering?

Ans: Peering between 2 ISPs for one kind of traffic (e.g., YouTube), but not
other kinds of traffic (e.g., web traffic).

23. What does the term "velocity" in the context of the Espresso paper mean?

Ans: The ability to quickly add features and change routing policy.

24. How are forwarding loops detected by the BGP protocol?

Ans: BGP is a path vector protocol where every routing announcement contains
the whole AS-level path. Forwarding loops are detected by seeing if there are
duplicate ASes in the AS-level path.

25. Explain how reliability can be implemented without any router support.
Explain how reliability can then be improved with router support. Does the
Internet require router support to implement reliability? Why or why not?

Ans: Reliability can be implemented by tracking unacked packets/bytes at the
sender and retransmitting upon a timeout. It can be improved by having the
router provide reliable delivery between one router and the next. The Internet
does not require router support to implement reliability because the original
design sought to make the least possible demands on the router (best-effort
packet forwarding).

26. Explain the functionality of the directory system in VL2.

Ans: To maintain the mapping from application addresses to locator addresses
and update this mapping whenever a servicemigrates.

27. What are the kinds of flexibility RMT provides relative to a fixed-function
switch? What is the kind of flexibility RMT does not provide?

Ans: Flexible parsing, lookups, and actions. Flexible sizing of tables. It does
not provide programmable scheduling, state manipulation, or deep packet
processing.

28. What shortcomings of the OpenFlow standard did the P4 paper attempt to
remedy?

Ans: Fixed match fields. Fixed actions. Pre-allocating stages to tables.
Growing number of match fields and actions.

29. What are the responsibilities of the control and data planes? In the
context of these 2 planes, what was the main idea behind software-defined
networking?

Ans: Control plane: Discover routes and enforce policy (e.g., access control).
Data plane: Forward packets according to routing table policies. In SDN, the
control plane was moved out of the routers and into a separate central
location, where it could be scaled independently using distributed systems
techniques.

30. What benefits does WFQ provide that XCP does not?

Ans: WFQ provides isolation between flows and prevents one misbehaved flow from
affecting the throughput of another.

31. Describe the difference between the kind of router feedback used by XCP and
DCTCP.

Ans: XCP: multi-bit feedback in router header. DCTCP: single-bit feedback from
routers that is then converted into multi-bit feedback at the sender.

32. Explain why standard TCP congestion control (i.e., the algorithm in the
congestion avoidance and control paper) does not perform well in networks with
high bandwidth-delay product.

Ans: To achieve full utilization at higher bandwidth-delay product, the loss
probability needs to be absurdly low. See
https://cs.nyu.edu/~anirudh/CSCI-GA.2620-001/lectures/xcp_motivation.pdf for a
description of how loss probability must scale with the inverse square of the
BDP for full utilization. This is too strong a requirement, especially given
that the loss probability goes up as the number of routers on a path from the
sender to the receiver increases.

33. What is the "optimum operating point" at which BBR attempts to operate? Why
is this a desirable operating point?

Ans: The optimum operating point has the highest throughput and lowest delay,
i.e., bottleneck bandwidth and minimum rtt. This is desirable because it's the
best possible behavior for a network protocol. This also happens to be the
intersection of the app-limited and bandwidth-limited regions of a
congestion-control protocol.