7 days ago5 min read

Charting Downstream Bottlenecks and Congestion Avoidance – Part 2 (Robert Brownell)

Part 1 illustrates the operation and friendliness of three different congestion-avoidance algorithms (CAAs) – Vegas, Cubic and BBR – with Reno, and dives into the detail of various phases of BBR operation.

The BBR behaviour shown in both parts relates to the first BBR version. A second version has already been adopted by Microsoft and Google, and the specification for BBRv3 is presently being finalised. The later versions have several improvements including exploitation of TCP Explicit Congestion Notification (ECN).

Time-Wasting Weakness

A closer look at BBR’s handling of the startup phase (see Part 1) reveals a glaring weakness that is common to many algorithms:

This is a very small portion of a data-sequence diagram of the BBR connection at its start. The stack of short vertical black strips on the left indicates the sequence ranges of successive packets, against the left-hand scale. A horizontal tick indicates the end of a packet; a long tick indicates the end of an application block. Strips with red stripes are packets that were lost after being seen by the sniffer, and NetData has inferred those losses according to selective acks (SACKs) drawn on the right-hand side: blue bands indicate the selectively-acked ranges; and the red cross-hatched bands indicate the lost data.

The bottom band corresponds to the first lost packet, and near its start a purple vertical strip marks a retransmission. After another round-trip time a second retransmission appears because a second burst of SACKs acknowledging other retransmissions indicated that the first retransmission was lost. This chart shows that 3 out of the first 5 retransmissions were lost, but just one is sufficient to illustrate the performance weakness: in short, the retransmission losses add delay in recovering lost data, wasting time and bandwidth. Further charts allow us to assess the damage.

The depiction of SACKs on the above chart indicates the sender’s view of what has been lost by accumulating the information in all the preceding SACKs. The chart below, however, shows the content of only the most recent SACK at any time, and, because one packet can specify only three selectively-acked ranges, those ranges can appear as lost in subsequent SACKs.

The second triangular shape corresponds to the second set of SACKs that acknowledge retransmitted data. It has a similar size because the first retransmission was lost.

On the broader sequence chart below, the blue vertical strip highlights an interval of 90 ms in which the sender transmitted no packets, and the bytes-in-flight graph underneath indicates why: the receive window was full.

The wasted time was a consequence of the first few retransmissions being dropped off the downstream queue and almost certainly could be avoided simply by inserting a small pause before the first retransmission, allowing the queue to drain slightly. In our experience loss of the first retransmission is common, and not surprising when it follows immediately behind a stream of original packets, a stream that suffers a significant percentage of losses. However, the damage is not always as severe, especially if the receive window is large.

Charting

Charts like these are particularly valuable to troubleshooters and prove an ideal tool for triage, at least narrowing the problem domain if not able to fully identify a problem cause. For developers of CAAs they reveal how well the algorithms work, in all circumstances. Packet analysis is vital to characterise weaknesses and reveal the existence of any uncontrolled variables that may be distorting test results.

Compared to the IO graphs of other packet analysers NetData achieves a more accurate picture in two ways: it reflects the effect of every packet on the drawn variable, rather than plotting averages in consecutive time intervals; and for variables such as bandwidth-use that must employ averages, it apportions the effect of some packets to two adjacent time intervals according to the numbers of bytes transmitted in each interval.

The latter technique avoids the spikes on the Wireshark graphs of throughput (at left). Furthermore, thanks to queue modelling, NetData can measure the throughput of different connections at different points in the network: at the sniffer; at queue ingress; or at queue egress as above. When there is a backlog (cream area graph), the two throughput graphs (blue and red) should form a mirror image.

Measuring Bottleneck Speed

If the estimated bottleneck speed – the speed of the queue’s egress link – is accurate, then for each connection there should be a near-constant difference between packet round-trip times and their queue-waiting times, and this can be judged when NetData plots markers for those times against time-of-day:

NetData plots markers for lost packets against the time-of-day scale at the times that they arrived at the queue, but plots markers for RTT and queueing time at the times that their packets left the queue.

On the above chart the bands of markers for RTT (black), relative queue-waiting times (green), and queue length (brown) track each other very closely. This assessment is quite sensitive, as the bands of respective markers will diverge rapidly when there is only a small error in the assumed speed. When the speed of the above queueing model was decreased by only a tenth of a percent, the bands of black and green markers clearly diverged or converged:

With an inaccurate speed estimate, packets will appear to be lost at different queue lengths, as above.

Using NetData

The information appearing on a NetData chart takes three steps in its progress from raw packets: first into a database; then a charting module; and finally onto charts. Because NetData provides extensive filtering capabilities in the last two steps there is little need to filter the capture file or its analysis.

The dialogue chart has a special role because it is drawn directly from the database and allows the traffic of any client, server, service or application type to be selected graphically for loading into the charting module. The dialogue chart is usually the first chart to be viewed because it not only summarises the captured traffic but also highlights all types of network abnormalities occurring in different dialogues.

NetData Lite is particularly easy to use. Simply drop a capture file on its main window or icon and it will analyze the file and offer to generate any of its charts:

The capture files for all the CAA charts in Parts 1 and 2 contain no extraneous traffic and all their packet records are loaded into the charting module, as in the chart selection above (on left). When the flow chart appears above the timing chart, click its ‘All Connections’ and ‘Format’ buttons (see below) to set the desired queue-modelling controls (as above on right).

The critical controls belong to the grey group below. They specify the destination address of the packets that will reach the downstream queue, and the speed of its egress link. The blue Estimate button assigns a monitored MAC address if not already set, and finds an egress speed by running tests with different speeds assigned to the queueing model until there is a near-constant difference between round-trip times and queueing times.

The unused controls in this group support the modelling of packet-shaping and packet-policing, to identify the existence of such schemes within a network. There is popup help on all the controls to explain their use.

Downloading NetData

NetData’s flow chart has many more options than were useful here. To add the overlays one-by-one, investigate your own packet flows, and use NetData’s broad set of application decoders, download the free NetData Lite, with copious documentation, from this FTP account:

Domain name: measureit.serveftp.net

Browser URL: ftp://measureit.serveftp.net/

User name: NetDataLiteP (all letters)

password: visual!ser (note exclamation mark)

For further guidance on how to produce these charts refer to the Charting Guide. If you would like any help or the more powerful NetData Pro, email bob@netdata-pro.com

https://thetechfirm.com/

Charting Downstream Bottlenecks and Congestion Avoidance – Part 2 (Robert Brownell)

Time-Wasting Weakness

Charting

Measuring Bottleneck Speed

Using NetData

Downloading NetData

Recent Posts

Comments

Join our mailing list