We’ve spent ages becoming familiar with TCP congestion avoidance and mastering the subtleties of New Reno and Vegas, but then we learnt that these schemes perform very poorly in modern networks. So now we must face up to the sophistication of the latest algorithms such as BBRv3. Fortunately, the best way to understand them is, again, with queue modelling and richly-detailed flow charts. Not just a few data points summarizing individual test results, but the equivalent of a high-resolution, slow-motion video showing how the algorithms behave, packet-by-packet, throughout a data flow.
The charts below appeared while trying out a new analytical function added to NetData. After analysing a pcap file it is able to determine the speed of a downstream bottleneck, infer which packets were lost downstream, plot the length of the packet queue behind the bottleneck, and show the queue length when packets were lost. NetData can also plot bandwidth-use derived from the lengths of captured packets, and the throughput implied by the rate of returning acks, but a much more accurate speed estimate is required for, and is obtained from, discrete-event models of queue behaviour. An accurate model gives unique insight to the behaviour of congestion-avoidance algorithms (CAAs), showing the relationships between bytes-in-flight (the congestion window, CWND), bandwidth-use and throughput, packet loss, queue length, queue waiting times, and round-trip times.
This is the only sensible way to investigate problems arising from microbursts, because the probability of packet loss is not a function of peak bandwidth alone but depends critically on buffer size and the pattern of bursts over time. NetData may be the only packet-analysis tool that takes buffering into account and can show us the source and content of every packet in the downstream queue, at any time.
But we have digressed. A graph of queue length overlaid with graphs of throughput and bytes-in-flight for individual connections, together with plots of RTT for individual packets, affords a unique understanding of congestion-avoidance algorithms (CAAs). All the charts here are drawn from packets captured by Vladimir Gerasimov when two TCP connections, driven by FlowGrind, were transferring data as fast as possible to the same destination, using different CAAs.
This chart displays bytes-in-flight (BIF) and the traffic rate of the two connections, one using Reno congestion avoidance (red) and the other using the Cubic algorithm (blue). After each group of packet losses (purple square markers), when the queue occupied 450 KB (0.9 x BDP), Reno halved the congestion window (CWND) and BIF, whereas Cubic reduced BIF by only 30% and rapidly raised it to the preceding window size, before exploring an even larger rate. Although the two connections started with the same throughput totalling 40.1 Mbps, after 150 seconds Reno had dropped below 8 Mbps and Cubic had exceeded 32 Mbps. Cubic, widely used by Linux, makes better use of the available bandwidth but unfairly steals bandwidth from older schemes like Reno.
This chart also exemplifies a common problem with these types of CAA. When the queue buffer overflows and packets are dropped from the tail of the queue, many if not all the connections reduce their flow at the same time and run the risk of wasting the network’s available bandwidth. We see in this case, however, that we have just enough buffer space to ensure that the queue doesn’t remain empty for long, and this is another reason why it is so useful to track queue length. New CAA schemes have been proposed to restrict packet loss to a single connection, choosing a victim that will be least affected by packet-loss delays and dropping its packets just before the buffer fills, leaving buffer space for the packets of other connections.
The purple markers indicating a downstream packet loss are overlaid on graphs of queue length and bytes-in-flight. The latter markers clarify which connections suffered a loss, and in this case the Reno connection was fortunate to avoid loss on two occasions when the queue overflowed.
Vegas and Reno
Not all congestion-avoidance schemes treat Reno so poorly. The next chart matches Reno with Vegas, a scheme designed to detect congestion by monitoring round-trip times and reducing BIF when the acknowledgement rate falls significantly below the expected rate.
BBR and Reno
The next chart illustrates the behaviour of a quite different CAA, replacing Vegas with BBR (Bottleneck Bandwidth and Round-trip propagation time). BBR achieves higher throughput and much less sensitivity to random packet losses because it does not depend on packet loss to detect congestion. It regularly measures the minimum round-trip time – the path’s propagation delay – and the available (bottleneck) bandwidth in what are called probes. It feeds these two parameters to a network-path model and state machine which outputs three parameters to control the TCP sending engine: the sending or pacing rate; the allowed volume in the network (bytes in flight); and the quantum (burst size). In addition to the regular measurements, BBR dynamically adjusts the estimates for bandwidth and minimum RTT on receipt of every TCP Ack.
The few packet losses (purple squares) were a consequence of the slow rise in the Reno BIF (red). The BBR BIF, although highly variable, was carefully constrained to avoid unnecessary buffer overflow. It had spikes at intervals of 833 ms when it briefly boosted BIF to measure the maximum bandwidth (a ProbeBW). The blue BIF spikes naturally coincide with the spikes in the brown graph of queue length.
The above chart also shows the regular drops in total throughput (black) and BBR BIF at 10-second intervals, caused by RTT probes, and shows that packet losses caused the buffer to empty. Fortunately, the buffer was empty only briefly – producing the small drops in the thin black throughput graph – and there was little waste of the network’s bandwidth in this configuration.
On the above chart, at 22:14:50, Reno filled the queue buffer and reaction to the consequent packet losses (purple square markers) in both connections fully drained the queue.
This chart spans just 7 seconds and shows the regular BIF spikes (bandwidth probes) and one of the severe drops in throughput (a ProbeRTT) that occurred at 10-second intervals. The following TCP sequence chart reveals BBR behaviour during this RTT probe.
The start of the throughput drop coincided with a BIF peak which accounts for the rising round-trip times on the left. The BBR sender waited for all the data-in-flight to be acknowledged and then sent only one burst of 4 segments. That process was repeated once before the normal flow was resumed, and each iteration produced an estimate for the minimum RTT, here about 100 ms.
The next chart zooms into the startup phase when the queue was filled twice in quick succession and more than 568 packets were lost. The queue was then allowed to drain fully – in compliance with the BBR specification – before BBR pushed an optimum number of packets into the network and began pacing the flow.
The BBR behaviour shown here relates to the first BBR version. A second version has already been adopted by Microsoft and Google, and the specification for BBRv3 is presently being finalised. The later versions have several improvements including exploitation of TCP Explicit Congestion Notification (ECN).
Part 2 illustrates a common weakness in the slow-start phase of most TCP congestion-avoidance schemes and suggests a simple step to avoid that weakness. It shows how NetData measures the egress speed of downstream bottlenecks; how queue-modelling graphs are configured; and how NetData achieves a high accuracy in its graphs.
Downloading NetData
NetData’s flow chart has many more options than were useful here. To add the overlays one-by-one, investigate your own packet flows, and use NetData’s broad set of application decoders, download the free NetData Lite, with copious documentation, from this FTP account:
Domain name: measureit.serveftp.net
Browser URL: ftp://measureit.serveftp.net/
User name: NetDataLiteP (all letters)
password: visual!ser (note exclamation mark)
For further guidance on how to produce these charts refer to the Charting Guide. If you would like any help or the more powerful NetData Pro, email bob@netdata-pro.com
Bob Brownell
Director and Principal Consultant, Measure IT Pty Ltd
Bob’s early engineering career included design and software development for computer equipment, real-time control systems and data networks with two communication firms in Sydney, and for packet-switching equipment with Alcatel in Antwerp. In his own firm over the last 25 years he has developed packet-analysis software and provided consultant services covering a wide range of IT performance issues. His objective for NetData has been to extract as much diagnostic information as possible from network traffic, reconstruct application behaviour, and present all findings graphically as clearly as possible.
The result has been that NetData users frequently resolve complex problems that have defeated all other analysis tools.
Bob graduated in science and engineering with a PhD from University of Tasmania.
over 650 troubleshooting and training videos at https://www.youtube.com/user/thetechfirm