How bufferbloat kills your VOIP and other low-latency communications

2016 September 29
by Daniel Lakeland

So, if you use a VOIP phone, or Skype, or play real-time internet games, or do lots of video chats you may notice that sometimes your connection becomes "SLOOOW" in particular that there is a large lag between when you do something and when that thing registers on the internet. Symptoms of this include both sides of a conversation talking over each other for example.

One way to deal with this is QOS traffic shaping. That is, to define different queues for different packets. I've covered that before on this blog. But one thing that is typically recommended is that you QOS traffic shape only on the up-link, that is packets you're sending to the internet. Packets you receive from the internet, typically you'd just send them to their destination machine on your network as fast as possible.

Well, it turns out that this is bad advice. Yes, you should traffic shape your upstream packets, but, there is good reason to actually traffic shape your download packets as well. The reason is, Bufferbloat.

Somewhere on the other end of your cable service (or DSL or WISP or whatever) is a router that is sending your router packets. That router may have a large buffer in it. If something like a streaming video player on your network requests a big block of video data, the connection between Amazon or whatever and the router on the other end of your last-mile cable could be a lot faster than the cable connection you have. When you request all that video data, the remote router will get an initial flood of data and slam it all into a buffer to be sent to you. Once that buffer fills up, it will stop ACKing packets and Amazon will eventually slow down. The only problem? If you are expecting some voice packets to come in, they will have to wait in line at the remote router until all that video gets sent to you, and if that buffer is big, they will have to wait in line for maybe 1 to 3 SECONDS. That means several seconds of DEAD AIR on your VOIP phone (or several seconds of NO RESPONSE in your game).

The solution is to not let that remote router buffer up stuff. To do that, you have to ensure that your router drops packets on the floor quickly so that Amazon throttles its bandwidth back so that the remote buffer doesn't fill up. The main way you have to achieve this is to use QOS and throttle your download speed slightly below what your ISP offers you. Here's how:

Go to the dslreports speed test and run the test with your QOS turned off, and no-one using your internet connection. Run it two or three times and see how fast your uplink and downlink speeds are. Pick the smallest value you see out of 3 tries for upload and download and multiply by 0.95. These two numbers will be your uplink and downlink speed you will put into your QOS script.

Did the dslreports speed test show buffer bloat of several hundred to a thousand milliseconds? Mine did on download, and it was destroying my VOIP performance. I have about a 60 Mbps download connection, but if I had someone pulling video I could have 1-3 second cut-outs in my audio thanks to buffer bloat. Putting 58000 kbit/s in my download speed of my qos script dropped buffer bloat measurements from 1000+ ms to 40ms. Now, when my router detects that it's receiving packets faster than 58000 kbps it stops sending ACKs and the source will automatically slow down its send speed, this prevents a buffer from blowing up in size at my ISP's router, and now I can control my QOS with low-latency!

You're welcome!

7 Responses leave one →
  1. Daniel Lakeland
    September 30, 2016

    With QOS

    immediately after without QOS: (ABYSMAL, couldn't actually complete the tests without debugging an Error 1 message)

    and back to QOS turned on with Upload = 3700 Kbps and Download = 59000 Kbps

    • Daniel Lakeland
      September 30, 2016

      Note that although 59k and 3.7k are the raw speeds for QOS, the QOS script reserves some of that for high priority packets like voice, so that speed-test packets only get evidently about 80% of the max. I am not complaining, raw speed is relatively unimportant compared to having to wait 1 to 2 seconds for voice packets to transit the network.

  2. Daniel Lakeland
    September 30, 2016

    And a speed test run while talking over an echo-test to my asterisk server:

    which is rock solid at the same speeds when I was doing no voice and there were no noticeable voice gaps stuttering or etc during this test.

  3. October 3, 2016

    I note that not all forms of QoS will rate limit inbound successfully, that even using htb+fq_codel or cake may not hold things down if the upstream buffer is massively oversized.

    • Daniel Lakeland
      October 6, 2016

      Yes, not all QOS software will rate limit inbound traffic. But OpenWRT's standard qos_scripts does do it. If your upstream connection delivers packets at rate R and your router consumes packets at rate r < R then buffers upstream will tend to empty. The reason is that TCP ACKs are not going to go back upstream because your router will not process them, and therefore the TCP protocol specifies that the source should reduce its transmit rate until the rate of ACK matches the rate of send. The same is not true if you're getting massive streams of UDP packets... but that's not the typical situation.

      So you can have some spikes and soforth, but as a first approximation, rate limiting your inbound to below the rate that your ISP generally sends to you will keep the ISP buffers relatively short.

      Note also that as far as I can see, the ISP (in my case Charter cable services) uses large buffers specifically to game the speedtest... if they fill up a buffer with a couple thousand packets then they can ensure they can give you a very steady rate that will look good on naive speed tests (I paid for 60 Mbps and I get consistently 61 Mbps!!! wow). But if that means every large download your kids start kills your telephone calls... not good.

      • Daniel Lakeland
        October 6, 2016

        To clarify, it's the combination of QOS limiting the inbound rate, and skipping the VOIP packets ahead in the queue, combined with fq_codel dropping packets on the floor when the buffer gets too big that causes TCP connections to throttle (the packets dropped on the floor never get ACKed so the source throttles its send speed).

        If the buffer just builds up enormously, the packets don't get dropped, the ACKs *do* get sent, and the source never throttles. you need BOTH the QOS to keep the high priority packets jumping forward in the queue, and something like fq_codel (or even just a small packet buffer on your router) to get the full effect.

    • Daniel Lakeland
      October 6, 2016

      Hi Dave, I didn't recognize your name, but I see you're involved in so I assume you know all that stuff I just replied to you with, hopefully it's useful for some of my other readers most of whom are more data analysis and statistics oriented than network systems engineers.

      The fq_codel stuff seems quite good, so thanks for your efforts in getting it mainlined into OpenWRT.

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS