Understanding HFSC in Linux QoS

2018 January 16
by Daniel Lakeland

So, over the last few years I've been working on getting VOIP phone systems working as flawlessly as possible. It's a surprisingly difficult thing to do. There are many pieces of the puzzle that can go wrong.

One of the important pieces of the puzzle is the network behavior at your router. The issue is that ideally for each call that comes through every 0.02 seconds a new audio packet will arrive and it should be delivered to the phone. If it's delayed by more than 0.02 seconds, it's basically worthless as it's too late to play that bit of audio anyway, the next audio bit needs to be played. This isn't quite true, because SIP phones use jitter buffers, so if once in a while you get some delays it can recover. But jitter buffers add to the overall phone call delay, so with too big of a jitter buffer, conversation participants start to talk over each other. This is a particular problem when calling between VOIP systems and cell phones as cell phones often have fairly long delays.

Well, so far, I've been using FireQOS to set up a quality of service system that manages the queue of packets on my router. It's a pretty nice system, and it does fairly well. But it isn't perfect. In particular, it uses the hierarchical token bucket (HTB) qdisc on Linux. This particular algorithm shares the bandwidth between different classes of traffic using a particular scheme that allows certain classes to borrow bandwidth from other classes that aren't using it.

An alternative is the Hierarchical Fair Service Curve (HFSC) qdisc, and in general, this is much more poorly understood. However I discovered a great set of resources about it, and after reading that explanation I tried it out. The result was substantially more reliable low-jitter connections. Here are the resources: Stackexchange Thread and Tutorial

Here's a very useful scheme you can set up. In addition to this set of classes, you need to also have filters that select which class each packet goes into. That's a separate issue. But suppose you can select your voip packets to go into 1:10, and game packets into 1:20 and say interactive packets like ssh or ping or maybe video streams into 1:30, by default everything goes to 1:40 and packets for long-running file transfers go into 1:50...

    DEV=$1
    BW=$2
    NCALL=$3



tc qdisc del dev ${DEV} root
tc qdisc add dev ${DEV} handle 1: root hfsc default 40
tc class add dev ${DEV} parent 1: classid 1:1 hfsc ls m2 ${BW}kbit ul m2 ${BW}kbit

## voip class
tc class add dev ${DEV} parent 1:1 classid 1:10 hfsc rt m1 $((250*8*$NCALL*2/5))kbit d 5ms m2 $((250*8*$NCALL/20))kbit
## games class
tc class add dev ${DEV} parent 1:1 classid 1:20 hfsc rt  m1 $((250*8*$NCALL*2/5))kbit d 5ms m2 $((250*8*$NCALL/20))kbit

## above classes get pfifo behavior 

## LS classes, interactive
tc class add dev ${DEV} parent 1:1 classid 1:30 hfsc ls  m1 $(($BW * 75/100))kbit d 100ms m2 $(($BW * 2/10))kbit
tc qdisc add dev ${DEV} parent 1:30 handle 30: fq_codel
## default
tc class add dev ${DEV} parent 1:1 classid 1:40 hfsc ls  m1 $(($BW * 20/100))kbit d 100ms m2 $(($BW * 6/10))kbit
tc qdisc add dev ${DEV} parent 1:40 handle 40: fq_codel
## lowprio
tc class add dev ${DEV} parent 1:1 classid 1:50 hfsc ls  m1 $(($BW * 5/100))kbit d 100ms m2 $(($BW * 2/10))kbit
tc qdisc add dev ${DEV} parent 1:50 handle 50: fq_codel

Now, this sets up a top-level HFSC class called 1:1 which has a maximum of $BW output bandwidth. It then sets up 2 real-time queues, and 3 link sharing queues to apportion the bandwidth. I assume 100kbit/s for a phone call, and a similar amount for a "game". I currently am not using the game class.

The voip class is set up using the specification

rt m1 $((250*8*$NCALL*2/5))kbit d 5ms m2 $((250*8*$NCALL/20))kbit

This means "real time" class, with a burst rate of 800*N kbit/s for up to 5milliseconds, and a steady rate of 100 N kbit/s. How does this work?

First off, if I understand this correctly, whenever there is traffic in real-time classes, all link share classes have to wait. Because of that, it makes sense to make sure that the total amount reserved for real-time is a smallish fraction of your total budget. In other words, you can't really expect VOIP calls to work great if you have 1Mbit of upstream speed or less because 800*2 = 1600 kbits/s which exceeds your 1000 kbit/s available. Now, what's the "d 5ms" mean? It means that this "burst" speed is only available for up to 5ms. The idea here is that if your have N calls, and there are 2 packets in the queue per call, you can completely drain the queue by 5ms later. The speed I calculated is: 250 bytes * 8 bits/byte * N calls * 2 packets/call / 5ms. But, we don't want this to go on forever, the goal is just to keep the queues short by draining them quickly. Long run, only 250*8*N/20 kbit/s are in use.

The "burst" speed time-period begins at the point where competition for sending occurs, in other words, when there is more than one real-time queue with a packet. HFSC looks at all the real-time queues and determines which one has the *quickest* packet to send, and sends that. The time it takes to send a packet is packetLength / burstSpeed. Real time packets get bandwidth until one of two things happens: they exceed their total sending allotment, or they drain their queues.

For real time, the total amount you can have sent from time t = 0 to time T=now is m2 * T. You can think of this as a line that increases in time at slope m2. If you had a long period with no calls going on, then the amount of bytes you have sent so far is much smaller than m2*T. This means whenever a packet comes in, it gets sent at the m1 rate until the queue drains. Since the m1 rate drains the queue much faster than it fills up from new VOIP packets. We basically ensure that as each VOIP packet arrives, it gets sent very very quickly and we rarely have more than a few packets in the queue.

Suppose you wanted to use the games class? If you care more about VOIP than games, as you probably should. You'd want to make sure that the m1 for VOIP calls was several times faster than the m1 for games. This means several packets would be sent for VOIP for every packet sent for your game. Let's say for example you need 500 kbit/s for your games, and you want to be able to send 3 game packets within 5ms during burst to drain backlogs. Game packets are say 100 bytes. The m1 speed should be 100*8*3/5 =  480 kbit/s. If you want to send at least 3 VOIP packets PER CALL for each game packet, with 250 byte voip packets, you make m1 = 480*N * 3 * 250/100 = 3600 *N kbit. The m2 rate stays the same, it is what limits the real-time class from completely using ALL the available bandwidth. It's worth pointing out that what matters is really the *ratio* of speeds between the competing classes, not the absolute speed of any class.

Now, what about link-sharing?

Link sharing always goes after real-time, either because real-time queues were totally drained, or because real-time hit its limiting total amount to be sent based on m2 and the time from boot-up to now.

Assuming real-time has drained, let's look at the classes 1:30 1:40 and 1:50

Assume all three classes have packets that arrived just now. We're at the start of a congestion period. Which packet gets sent first? again, its the one that's the fastest to send, but based on the m1 rates for these classes for the first 100ms of congestion. During an initial 100ms period, the class 1:30 can send at 75% of the link rate, the default queue 1:40 can send at 20%, and the low-priority queue can send at 5% speed. This means roughly that if all the packets are similar sized, and there are say 100 packets backlog in each class, the high priority queue will send 75 packets the medium will send 20 packets and the low priority will send 5 packets in a given chunk of time.. provided that chunk is less than 100ms and provided the queue doesn't drain. After the 100ms the rates change, high priority has to slow down to 20%, default priority can send at 60% and low priority can send at 20%.

Lets say the burst speed lets the high priority queue drain, now only default and low priority are competing. The default queue will send 20/5 times as many packets as the low priority. If default drains its queue, then low priority gets 5/5 = 100% of the remaining bandwidth.

And this brings up a good point: limits only apply when there is competition between queues. If default is the only thing that has traffic to send, it gets 100% of the bandwidth. And in general if there are several queues the amount queue i gets is bw[i]/sum(bw[j] for all j).

Once you understand how HFSC really works, which is not trivial without the help of the tutorials... You can design a very good queueing system that handles truly real-time traffic like VOIP or robot feedback control or even games, while also giving good service performance in link-sharing. The link-share definition alone even without the real time, tends to keep latency down on the high priority queue at the expense of increasing latency for traffic that doesn't care, such as your large download of a gigabyte operating system install image or whatever.

Note that if you do a hierarchy that's deeper, only the *leaf* classes get real-time service. It really makes sense to have all your real time classes directly attached to the root class 1:1

 

No comments yet

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS