TCP congestion control algorithms have a tough problem to solve. They need to estimate the available capacity of a network path without any knowledge of network. Remember that TCP runs in the end hosts. It’s important to estimate the network capacity to achieve the maximum throughput for a single flow while sharing network resources with other flows. An idle link is wasted capacity.
TCP is an acknowledgement clocked protocol. The data receiver will periodically send acknowledgements to the sender to confirm reception of data. The ACK rate is determined by the TCP window size. When the sender receives an ACK, it knows that the acknowledged segments have left the network. Using various methods depending on the TCP congestion control algorithm, the sender will increase its sending rate until it detects that the maximum network capacity is reached. Traditional TCPs use packet loss to detect when the network is full. If the sender does not receive an acknowledgement, it should assume that the unacknowledged data was lost in transit. Traditional TCPs drastically reduce their sending rate when packet loss is detected, assuming that packet loss is an indicator of the network being “full”. Reduction of the sending rate is to allow the network buffers to drain. Filling the buffers until loss occurs increases delay and jitter. Cubic is the current default TCP congestion control method in Linux.
Google developed BBR congestion control as an alternative to the traditional loss-based approach. BBR stands for Bottleneck Bandwidth and Round-trip propagation time. BBR takes a different approach in that it periodically increases its sending rate and pays very close attention to how long it takes to receive an ACK. If increasing the sending rate does not increase the delay on ACK reception, there must be more capacity available in the network. If more capacity is available, BBR will increase its sending rate further. If the ACK is delayed, this is an indication that the network buffers are starting to fill, the network must be full. By ignoring random packet loss, BBR can push high throughput though lossy networks. Many internet users are always on radio networks such as cellular and Wi-Fi. Radio networks tend to have random losses that wired networks don’t normally experience. BBR is typically able to provide a better user experience in these lossy networks. BBR only needs to be enabled on the server to see benefit.
I built a lab out of 3 Raspberry Pis and an ethernet switch to test throughput with TCP Cubic and BBR in various network conditions with iperf3. One Pi is the iperf sender, one is the receiver, and one is a transit network device. On the transit node I used the Linux Traffic Control (tc) feature to artificially inject various levels of packet loss and latency into the transit path between iperf sender and receiver. My original plan was to use two SPAN sessions to capture packets before and after the injected loss and latency, but I found that the resulting packet captures were missing packets unexpectedly. The old 2960S may not be able to keep up with close to line rate SPAN sessions. For each test, I did 3 back-to-back iperf runs and noted the average of the 3. TN stands for Transit Node.
As an example, below is the 1st iperf run with no loss or latency added to the network. RTT between the iperf sender and receiver is around .4 ms and the average bitrate was 818 Mbits/sec. The output of “sysctl net.ipv4.tcp_congestion_control” confirms that TCP cubic is in use. I conducted a series of test with Cubic and the repeated the test with BBR enabled on the sender.
pi@routerberrypi:~ $ ping -c 5 10.0.0.5
PING 10.0.0.5 (10.0.0.5) 56(84) bytes of data.
64 bytes from 10.0.0.5: icmp_seq=1 ttl=63 time=0.501 ms
64 bytes from 10.0.0.5: icmp_seq=2 ttl=63 time=0.367 ms
64 bytes from 10.0.0.5: icmp_seq=3 ttl=63 time=0.366 ms
64 bytes from 10.0.0.5: icmp_seq=4 ttl=63 time=0.365 ms
64 bytes from 10.0.0.5: icmp_seq=5 ttl=63 time=0.381 ms
--- 10.0.0.5 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 133ms
rtt min/avg/max/mdev = 0.365/0.396/0.501/0.052 ms
pi@routerberrypi:~ $ sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = cubic
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.0.0.1, port 58736
[ 5] local 10.0.0.5 port 5201 connected to 10.0.0.1 port 58738
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 97.0 MBytes 813 Mbits/sec
[ 5] 1.00-2.00 sec 99.8 MBytes 837 Mbits/sec
[ 5] 2.00-3.00 sec 101 MBytes 843 Mbits/sec
[ 5] 3.00-4.00 sec 86.6 MBytes 725 Mbits/sec
[ 5] 4.00-5.00 sec 89.7 MBytes 755 Mbits/sec
[ 5] 5.00-6.00 sec 100 MBytes 840 Mbits/sec
[ 5] 6.00-7.00 sec 101 MBytes 844 Mbits/sec
[ 5] 7.00-8.00 sec 100 MBytes 842 Mbits/sec
[ 5] 8.00-9.00 sec 100 MBytes 841 Mbits/sec
[ 5] 9.00-10.00 sec 100 MBytes 839 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 975 MBytes 818 Mbits/sec receiver
Below is a summary of the results. The Ping Ave RTT column displays the end-to-end latency with the .4 ms baseline + additional latency I injected with Linux Traffic Control on the transit Pi. Configured Loss Rate indicates the packet loss configured with Linux TC. Both Cubic and BBR provided similar results with added latency and no configured packet loss. As latency increases, throughput decreases. This is expected as it takes longer for data and acknowledgements to traverse the network. The interesting bit is how much more performant BBR is when packet loss is present. The bottom chart compares Cubic vs BBR in the same conditions. Any time loss was injected, BBR was able to push through the loss and provide much higher throughput than Cubic. At 50.4 ms RTT and 1% loss, Cubic was only able to push 4.15 Mbps while BBR achieved 99.3 Mbps. The results spreadsheet is linked below for download.
A less obvious benefit to BBR is that it should allow for use of routers with shallow buffers since it does not try to push the network until the buffers fill. With BBRs algorithm designed to back off when latency increases (a sign of buffers filling up), shallow buffers should be sufficient. This can provide cost savings on routers since less buffer memory is needed. However, there are downsides to BBR. BBR version 1s aggressiveness means that it is not nice to loss-based congestion control methods. Running side by side, BBR should push Cubic flows out of the way and starve them for bandwidth. BBR v2 was developed to improve on this to provide better sharing of network resources with non BBR flows. The less aggressive behavior of BBR v2 likely means that it is not able to cope well with high levels of packet loss.
Please do your own testing and research before enabling BBR in your network and do not consider this article or testing as an authoritative source on TCP. This test was conducted in my home lab on Raspberry Pis, one of which does not even have a cooling fan and may be subject to thermal throttling. In the future I hope to do further testing and research on TCP, including testing with jitter. Stay Tuned!