Informative Priors from "masking"

2019 September 4
by Daniel Lakeland

There was a discussion on Andrew Gelman's blog about creating informative priors using "quantities of interest" which are basically functions of the parameters.

His example included some code which sets a prior over parameters a, b by combining several functions of a,b:

model {
  target += normal(y | a + b*x, sigma);  \\ data model
  target += normal(a | 0, 10);           \\ weak prior information on a
  target += normal(b | 0, 10);           \\ weak prior information on b
  target += normal(a + 5*b | 4.5, 0.2);  \\ strong prior information on a + 5*b

Andrew then went on to say: "You should be able to do the same thing if you have information on a nonlinear function of parameters too, but then you need to fix the Jacobian, or maybe there’s some way to do this in Stan."

I disagreed with this idea of "fixing the jacobian". Deep in the comments I discussed how this works and how to understand it vs how to deal with "jacobian corrections" when describing priors in terms of probability measures on a target space. The question of whether you need a Jacobian is determined by the role that the information plays, whether you have absolute knowledge of a probability measure, or relative knowledge of how much an underlying probability density should be modified (ie. "masked" if you're familiar with using masks in Photoshop). I thought I'd post it here so I can refer to it easily:

The first thing you have to understand is what is a Jacobian correction for.

Essentially a Jacobian “correction” allows you to sample in one space A in such a way that you induce a particular given known density on B when B is a known function of A.

if B = F(A) is a one to one and onto mapping (invertible) and you know what density you want B to have (pb(B)), then there is only one density you can define for A which will cause A to be sampled correctly so that B has the right density. Most people might work out that since B = F(A) the density should be pb(F(A))… but instead…

To figure this out, we can use nonstandard analysis, in which dA and dB are infinitesimal numbers, and we can do algebra on them. We will do algebra on *probability values*. Every density must be multiplied by an infinitesimal increment of the value over which the density is defined in order to keep the “units” of probability (densities are probability per unit something).

We want to define a density pa(A) such that any small set of width dA at any given point A* has total probability pb(F(A*)) abs(dB*)

That is, we have the infinitesimal equation:

pa(A) dA = pb(F(A)) abs(dB)

solve for pa(A) = pb(F(A)) abs(dB/dA)

if we said pa(A) = pb(F(A)) we’d be wrong by a factor involving the derivative dB/dA = dF/dA evaluated at A, which is itself a function of A. The absolute value is to ensure that everything remains positive.

abs(dB/dA) is a jacobian “correction” to the pb(F(A)) we derived naively at first.


So, the applicability is when

1) you know the density in one space
2) you want to sample in a different space
3) There is a straightforward transformation between the two spaces

In Andrew’s example, this isn’t the case. We are trying to decide on a probability density *in the space where we’re sampling* A, and we’re not basing it on a known probability density in another space. Instead we’re basing it on

1) Information we have about the plausibility of values of A based on examination of the value of A itself. p(A)

2) Information about the relative plausibility of a given A value after we calculate some function of A… as I said, this is a kind of “mask”. pi(F(A))

Now we’re defining the full density P(A) in terms of some “base” density little p, p(A) and multiplying it by a masking function pi(F(A)) and then dividing by Z where Z is a normalization factor. So the density on space A is *defined* as P(A) = p(A) pi(F(A)) / Z

Notice how if p(A) is a properly normalized density, then pi(F(A)) K is a perfectly good mask function for all values of K, because the K value is eliminated by the normalization constant Z, which changes with K. In other words, pi tells us only *relatively* how “good” a given A value is in terms of the value of its F(A) value. It need not tell us any “absolute” goodness quantity.

Should “how much we like A” depend on how many different possible A values converge to the region F(A)? I think this seems wrong. If you have familiarity with photoshop, think like this: you want to mask away a “green screen”, should whether you mask a given pixel depend on “how green that pixel is” or “how many total green pixels there are”? The *first* notion is the masking notion I’m talking about, it’s local information about what is going on in vicinity of A, the second notion is the probability notion: how much total spatial locations get mapped to “green” that’s “probability of being green”

For example pi(F(A)) = 1 is a perfectly good mask, it says “all values of F(A) are equally good” (it doesn’t matter what color the pixel is). Clearly pi is not a probability density, since you can’t normalize that. You could also say “A is totally implausible if it results in negative F(A)” (if the pixel is green don’t include it) so then pi(F(A)) = 1 for all F(A) >= 0 and 0 otherwise is a good mask function as well. It’s also not normalizable in general.

If you start including “jacobian corrections” in your mask function, then your mask function isn’t telling you information like “mask this based on how green it is” it’s telling you some mixed up information instead that involves “how rapidly varying the “greenness measurement” is in the vicinity of this level of greenness”. This isn't the nature of the information you have, and so you shouldn't just blindly think that because you're doing a nonlinear transform of a parameter, that you're obligated to start taking derivatives and calculating Jacobians.

Emile Borel, Leon Bruillouin, The Butterfly Effect, and the illogic of Null Hypothesis Significance Testing

2019 August 19
by Daniel Lakeland

I first read about this idea in Raghuveer Parthasarathy's blog. And I've used it in several places in arguments, but hadn't looked up the source.

The idea is this, about 1914 or so Emile Borel calculated that a perturbation corresponding to a single gram of matter moving in a certain way in a star several lightyears away would perturb the individual trajectories of the molecules of an ideal gas in a lab in such a way as to cause those trajectories to diverge from the ones you would calculate without the perturbation so that after a few seconds any given molecule would have diverged dramatically from your calculated trajectory. (To find citations on this I relied on this blog which ultimately says that the basic idea was described in Leon Brillouin's 1964 book "Scientific Uncertainty and Information" (pg 125) where he cites Emile Borel's 1914 book "Introduction géométrique a quelques théories physiques" (p 94) for Borel's calculation.)

This is a manifestation of the basic idea of Sensitive Dependence on Initial Conditions (aka the Butterfly Effect). A measure of this phenomenon is called the Lyapunov Exponent. Basically for small times after the initial point where your model begins to track the molecules, the error in your calculation grows exponentially like exp(\lambda t) for some \lambda. Exponential growth is of course very fast, so whenever \lambda is positive then your calculation is useless after some amount of time because there is always some error in your calculation and it rapidly blows up.

What does this have to do with Null Hypothesis Significance Testing (NHST)? Well, a common way to utilize this "paradigm" is to collect some data, experimental or otherwise, do some analysis, test a "Null Hypothesis" that the effect of X on Y is zero on average, and then find that the effect is either "statistically significant" or "non-statistically significant". This is taken as evidence that "X affects Y" or "X does not affect Y".

What the Borel calculation tells us is "everything affects everything else to some extent" and so logically in this NHST statistics there are only "true positives" and "false negatives". If you find a non-significant effect it's just because you didn't collect enough data.

Rethinking this paradigm logically, we should really be asking questions like "how big is the effect of X on Y" and "what other effects are there, and how big are their contributions, so that we can determine whether we need to take X into account in order to make useful description of Y"

Rethinking things in this manner immediately leads you to a Bayesian viewpoint on statistics: find out what regions of X effect space are plausible given our model and our data, and then determine whether we can safely neglect X relative to the other factors in our model.

RED, the random-early-drop QDISC as leaky-bucket

2019 March 12
by Daniel Lakeland

The RED qdisc, which is available in Linux to control network traffic is mostly not used these days in favor of things like codel and fq_codel. Originally it was designed to give feedback to TCP streams by dropping packets randomly before hitting a hard tail-drop, as a way to get the TCP streams to throttle back.

Today, the place where it might get used more often is something like a core router at an ISP because it doesn't require much computation and can maybe handle things like 40Gbps or 100Gbps ethernet.

But I recently had some discussions that led me to think it might have a use in certain circumstances where it wasn't traditionally in use. The particular use I'm considering is for UDP streams carrying realtime traffic. For example, game or phone traffic where a small packet is sent every 10 or 20 ms or so to update the state of the game or provide the next piece of audio data.

Now, if you set up GAMEBW to contain the bandwidth allocated in kbps for your total game traffic, and you assume typical packets are about 220 bytes (as estimated by example data I collected) you can set up a RED queue as follows:

tc qdisc add dev ${DEV} parent 1:20 handle 20: red limit $(($GAMEBW * 70 /8)) min $(($GAMEBW * 20/8)) max $(($GAMEBW * 50/8)) avpkt 220 burst $(($GAMEBW *30/8 / 220)) probability 1.0 bandwidth ${GAMEBW}kbit harddrop


Here the magic numbers 20, 50, 30, etc refer to the time in milliseconds that it would take at GAMEBW to send the data in the queue so far. min at 20ms means that the RED queue acts like a bfifo for the first 20ms worth of data... assuming that this is a leaf qdisc for some bandwidth limiting qdisc, we assume that enough bandwidth is available to ensure that this level of queue rarely occurs. But if there's a burst of traffic for some reason, we might start queuing past a 20ms queue, and then we have some probability to drop packets... which rises at 20ms from 0 linearly up to 100% (probability 1.0) once we hit 50ms of queue (we set limit at 70ms just in case).

If we start to exceed our allotted bandwidth for a while, we build up a queue, but we want to ensure that this queue doesn't build up past some point, in our case 50ms worth of data. If we have 1.5 times the bandwidth allowed coming in, we'll build up to a steady state where 1-1/1.5 = 33% of the packets are dropped, which occurs at 20+(50-20)*.33 = 30ms of queue length.

If a burst of packets lasts for a while, we either need to buffer it and hope that later it will go away, in which case our delay grows without bound until the burst stops... (bufferbloat!) or drop some of the packets and make sure that what goes out is no more than the max bandwidth and try to keep the delay low at the expense of sending only some of the packets.

The RED queue linearly interpolates between 0% drop at min, and 100% drop at max, so that no matter what the incoming over-bandwidth rate is, the delay stays below 50ms in this case.

RED is like a "soft tailed" bfifo, and although the TCP exponentially backs off when there's a little droppage... UDP does no such thing, and so the recommendations in the RED manual page don't really apply here... we want to get up to 100% droppage at our maximum allowable delay level, whereas the RED manual page assumes something like just 1-2% droppage, because TCP will dramatically reduce its bandwidth usage when it detects drops.

You can think of a model of the RED queue as a bucket with a small hole in the bottom, connected to a pipe, and a large hole in the side that dumps water on the ground. Water pours into the top of the bucket, and goes out the bottom. If we pour a little faster, we can build up a queue of water waiting to go down the pipe... up to 20ms of delay and the bucket doesn't leak on the ground.... As we pour even faster... the hole in the side pours some of the water out on the ground (drops packets). Depending on how fast we pour, the height of the water increases, and the side hole dumps water faster... Eventually the water reaches the top of the bucket and 100% of the excess dumps onto the ground. That's basically what's going on. When we turn off the tap... no more than 80ms later we've got an empty bucket.

Alcohol Risk qualitative example

2018 August 31
by Daniel Lakeland

A recent paper came out in The Lancet about alcohol consumption claiming that overall considering all causes, there's no "safe" level of consumption. Of course that made a big splash in the news. It happened that Andrew Gelman also posted another study about alcohol and when I mentioned some serious concerns about this Lancet paper, the study author actually responded on Gelman's blog.

I followed up with some specific concerns, but the main issue is that if you want to get causal inference you must either have randomized assignment to ensure on average zero probabilistic dependence on important unaccounted for effects, or have a *causal model* that mimics the real mechanism at least approximately and is believable. Of course the mechanism of death from diseases is ... life and all its complexities. In particular the time-dependency aspects.

To illustrate how this time dependency is important I created a qualitative differential equation model that is intended to simulate some kind of simple risk model through time. The basics are this:

1 unit of people are born at time t=0, as time goes on (1 unit of time is about 20 years) they slowly die of various causes. There are 3 causes we track through to 6 units of time:

  1. "Heart disease" which we assume begins to affect people around 2.5 units of time, and is reduced by 25% for each dose of alcohol, assumed valid only for small doses, such as less than 2.
  2. "Cancer" which we assume begins to affect people later, around 3 units of time, and is completely unaffected by alcohol (just for illustration purposes).
  3. "Other causes" which ramps up somewhat slowly between about 3 and 5 units of time but ensures that very few people really make it past 5 units or so (~100 yrs). Other causes are unaffected by alcohol.

Code is below, but first, the graph which shows overall survival, cumulative heart disease deaths, and cumulative cancer deaths through time for two groups: Red = no alcohol, and Blue = 1 dose of alcohol daily

Qualitative model of Alcohol risk and deaths

Notice that by construction the people who take 1 dose of alcohol have lower instantaneous heart disease risk, and the same cancer and other cause risk. This means they live longer (blue survival curve on top is above red). But if you look at the individual causes, you see a reduced heart disease risk, but in cancer you see there are MORE deaths? By construction, alcohol doesn't change your cancer risk in this model... So why do more alcohol drinkers die of cancer? The answer is that they live longer on average because they don't die of heart disease as much... So eventually they die of cancer, or "other causes".

At the very minimum you need to reproduce this kind of thing in your model, or you have no hope of teasing out how things work.

Now for the code:


The Build Out Line in US Youth Soccer: A Terrible Idea full of unintended adverse consequences

2018 July 24
by Daniel Lakeland

The US Soccer Federation has created a new set of rules for youth games. These games are played on reduced sized fields with sides of 7 players (7 on 7). The rules include a new line on the field called the "build out line" which is half-way between the top of the goalkeeper box and the midfield line. When the goalkeeper receives the ball in his hands, the rule is that play must stop, and the opponents must move out past the build-out line before the goalkeeper can re-start play. I've seen websites claiming that a quick-restart is still allowed, but the rules adopted from US Soccer by CalSouth are actually explicit that this is not allowed

"once the opposing team is behind the build out line, the goalkeeper can pass, throw or roll the ball into play (punts and drop kicks are not allowed)" (bold emphasis is mine, italics is theirs)

Now, whenever you create a rule in a game, you modify the tactics that work well in that game. The game is the rules and everything that the rules imply about what will work well to win. The point of a game is to make an honest attempt to win. If we're just trying to work out certain limited skills, it's called practice. Furthermore, the players want to do a good job at the game. The players do not want to be obliterated, nor to lose opportunities to score goals "for no reason" (ie. if the rules don't prohibit their tactic).

This new rule does several things to change the effective tactics available, all of them bad:

  1. It eliminates the role of the goalkeeper in breaking up the attack and coordinating a counterattack, forcing the keeper to wait and hold the ball until the opposing team has built up a defense rather than look for the best opportunity and method to distribute the ball. Quick and accurate distribution of the ball to the most well positioned players is one of the primary functions of the keeper in real soccer.
  2. It eliminates the threat of a long-ball by prohibiting punts and drop-kicks, allowing the defending team to push their entire team up to the build-out line to trap the ball in the back third and provide a high pressure defense that quickly turns over into offense in a dangerous area. The reason the defending team can build up to the build-out line is because they don't have to worry about a long ball. The basic concept of tactics is to tune your action to the situation. Tactically, if you think there's no long ball possibility, you can put all the players on front-line defense. The proper response to that tactic by the offense is actually ... the long ball. A game-theoretic correct response is therefore to figure out a hack around the rule prohibiting punts... Namely drop the ball to a fullback and have him punt it. Or take a "goal kick" type kick from the top of the box, neither of which are prohibited, but both of which are completely artificial.
  3. The build-out line is also used as the line past which offside can be called, allowing a team to cherry-pick goals by leaving players essentially a few yards above the top of the box, or for players to sprint ahead on breaks and receive passes well into the defending side's half without being called offside.

All of these tactics were immediately figured out as soon as this rule was put in place, as you can see in the comment section on ussoccer

For example Ryan Gilbert says:

I am a coach of U9 soccer in NJ. Firstly, congratulations on creating a entirely new game at your board meeting. Yesterday was our first game of the season and it was full of penalties and free kicks in front of goal due to this so called 'build-out line'. 30 mins of our 50 min game were actually soccer and all the goals on both teams came from attack taking advantage of the new 'build-out' rules. What a total calamity you've created, a travesty to the beautiful game


All the opposition needs to do is flood the build-out line and to exert maximum pressure on their defense and the ball never leaves the half way line. I don't know what game this is you've created but it's not soccer. This season is now a farce and it must be scrapped as soon as possible.
This is exactly the dynamic I witnessed at my kids soccer tournament this weekend, and which you can see repeatedly called out in other message boards and venues across the internet. Note that this "build-out" line disappears after 2 years of play (under 9 and under 10 are the only ages that have it).
The build-out line is a travesty, it's an amplifier for high pressure attacking teams as they now no longer ever need to defend a long ball, or get into any kind of defensive position, the whole game can be played in the back third of their opponents side. Nor do they really have to take accurate shots on goal. If you don't have enough support, just kick it to the goalkeeper, he'll hold it for you while you build up enough attacking strength at the build-out line... I'm not normally a user of profanity, and when I do use it I really mean it. So, in plain simple terms fuck that rule.
The dynamic that plain and simple does occur is: Whenever the opponent keeper gets the ball, the previously attacking team gets a break to set up a massive high pressure attack just outside the box, lines up like the charge of the light brigade and if 3 on 1 can't win the ball, all you really have to do is slam the ball off an opponent into touch: now you have a throw in right next to the goal and can kick the next goal... over and over.
This rule resulted in something like 12 or 15 goals against my sons' team this weekend. My son is learning tactics of the real game from a good and experienced coach, and it's making his team lose dramatically. Imagine how that makes the players feel? They're asking themselves "why are we doing so badly? what are we doing wrong? how come we can't get the ball out of our side?"
Thanks US Soccer for this "development" initiative. You certainly are developing something, possibly a steaming pile of unintended consequences?

Grant Money For Nothing

2018 May 17
by Daniel Lakeland
Now look at them yo-yo's that's the way you do it
You play the lottery on the point oh five
That ain't workin' that's the way you do it
Grant money for nothin', academic chits for free
Now that ain't workin' that's the way you do it
Lemme tell ya them guys ain't dumb
Maybe get a blister on your control-c finger
Maybe get a blister on your mousing thumb
We got to collect that carefully measured data, controls for all the conditions we see.
We got to run tests with simulated data, run long chains of MCMC...

Understanding HFSC in Linux QoS

2018 January 16
by Daniel Lakeland

So, over the last few years I've been working on getting VOIP phone systems working as flawlessly as possible. It's a surprisingly difficult thing to do. There are many pieces of the puzzle that can go wrong.

One of the important pieces of the puzzle is the network behavior at your router. The issue is that ideally for each call that comes through every 0.02 seconds a new audio packet will arrive and it should be delivered to the phone. If it's delayed by more than 0.02 seconds, it's basically worthless as it's too late to play that bit of audio anyway, the next audio bit needs to be played. This isn't quite true, because SIP phones use jitter buffers, so if once in a while you get some delays it can recover. But jitter buffers add to the overall phone call delay, so with too big of a jitter buffer, conversation participants start to talk over each other. This is a particular problem when calling between VOIP systems and cell phones as cell phones often have fairly long delays.

Well, so far, I've been using FireQOS to set up a quality of service system that manages the queue of packets on my router. It's a pretty nice system, and it does fairly well. But it isn't perfect. In particular, it uses the hierarchical token bucket (HTB) qdisc on Linux. This particular algorithm shares the bandwidth between different classes of traffic using a particular scheme that allows certain classes to borrow bandwidth from other classes that aren't using it.

An alternative is the Hierarchical Fair Service Curve (HFSC) qdisc, and in general, this is much more poorly understood. However I discovered a great set of resources about it, and after reading that explanation I tried it out. The result was substantially more reliable low-jitter connections. Here are the resources: Stackexchange Thread and Tutorial

Here's a very useful scheme you can set up. In addition to this set of classes, you need to also have filters that select which class each packet goes into. That's a separate issue. But suppose you can select your voip packets to go into 1:10, and game packets into 1:20 and say interactive packets like ssh or ping or maybe video streams into 1:30, by default everything goes to 1:40 and packets for long-running file transfers go into 1:50...


tc qdisc del dev ${DEV} root
tc qdisc add dev ${DEV} handle 1: root hfsc default 40
tc class add dev ${DEV} parent 1: classid 1:1 hfsc ls m2 ${BW}kbit ul m2 ${BW}kbit

## voip class
tc class add dev ${DEV} parent 1:1 classid 1:10 hfsc rt m1 $((250*8*$NCALL*2/5))kbit d 5ms m2 $((250*8*$NCALL/20))kbit
## games class
tc class add dev ${DEV} parent 1:1 classid 1:20 hfsc rt  m1 $((250*8*$NCALL*2/5))kbit d 5ms m2 $((250*8*$NCALL/20))kbit

## above classes get pfifo behavior 

## LS classes, interactive
tc class add dev ${DEV} parent 1:1 classid 1:30 hfsc ls  m1 $(($BW * 75/100))kbit d 100ms m2 $(($BW * 2/10))kbit
tc qdisc add dev ${DEV} parent 1:30 handle 30: fq_codel
## default
tc class add dev ${DEV} parent 1:1 classid 1:40 hfsc ls  m1 $(($BW * 20/100))kbit d 100ms m2 $(($BW * 6/10))kbit
tc qdisc add dev ${DEV} parent 1:40 handle 40: fq_codel
## lowprio
tc class add dev ${DEV} parent 1:1 classid 1:50 hfsc ls  m1 $(($BW * 5/100))kbit d 100ms m2 $(($BW * 2/10))kbit
tc qdisc add dev ${DEV} parent 1:50 handle 50: fq_codel

Now, this sets up a top-level HFSC class called 1:1 which has a maximum of $BW output bandwidth. It then sets up 2 real-time queues, and 3 link sharing queues to apportion the bandwidth. I assume 100kbit/s for a phone call, and a similar amount for a "game". I currently am not using the game class.

The voip class is set up using the specification

rt m1 $((250*8*$NCALL*2/5))kbit d 5ms m2 $((250*8*$NCALL/20))kbit

This means "real time" class, with a burst rate of 800*N kbit/s for up to 5milliseconds, and a steady rate of 100 N kbit/s. How does this work?

First off, if I understand this correctly, whenever there is traffic in real-time classes, all link share classes have to wait. Because of that, it makes sense to make sure that the total amount reserved for real-time is a smallish fraction of your total budget. In other words, you can't really expect VOIP calls to work great if you have 1Mbit of upstream speed or less because 800*2 = 1600 kbits/s which exceeds your 1000 kbit/s available. Now, what's the "d 5ms" mean? It means that this "burst" speed is only available for up to 5ms. The idea here is that if your have N calls, and there are 2 packets in the queue per call, you can completely drain the queue by 5ms later. The speed I calculated is: 250 bytes * 8 bits/byte * N calls * 2 packets/call / 5ms. But, we don't want this to go on forever, the goal is just to keep the queues short by draining them quickly. Long run, only 250*8*N/20 kbit/s are in use.

The "burst" speed time-period begins at the point where competition for sending occurs, in other words, when there is more than one real-time queue with a packet. HFSC looks at all the real-time queues and determines which one has the *quickest* packet to send, and sends that. The time it takes to send a packet is packetLength / burstSpeed. Real time packets get bandwidth until one of two things happens: they exceed their total sending allotment, or they drain their queues.

For real time, the total amount you can have sent from time t = 0 to time T=now is m2 * T. You can think of this as a line that increases in time at slope m2. If you had a long period with no calls going on, then the amount of bytes you have sent so far is much smaller than m2*T. This means whenever a packet comes in, it gets sent at the m1 rate until the queue drains. Since the m1 rate drains the queue much faster than it fills up from new VOIP packets. We basically ensure that as each VOIP packet arrives, it gets sent very very quickly and we rarely have more than a few packets in the queue.

Suppose you wanted to use the games class? If you care more about VOIP than games, as you probably should. You'd want to make sure that the m1 for VOIP calls was several times faster than the m1 for games. This means several packets would be sent for VOIP for every packet sent for your game. Let's say for example you need 500 kbit/s for your games, and you want to be able to send 3 game packets within 5ms during burst to drain backlogs. Game packets are say 100 bytes. The m1 speed should be 100*8*3/5 =  480 kbit/s. If you want to send at least 3 VOIP packets PER CALL for each game packet, with 250 byte voip packets, you make m1 = 480*N * 3 * 250/100 = 3600 *N kbit. The m2 rate stays the same, it is what limits the real-time class from completely using ALL the available bandwidth. It's worth pointing out that what matters is really the *ratio* of speeds between the competing classes, not the absolute speed of any class.

Now, what about link-sharing?

Link sharing always goes after real-time, either because real-time queues were totally drained, or because real-time hit its limiting total amount to be sent based on m2 and the time from boot-up to now.

Assuming real-time has drained, let's look at the classes 1:30 1:40 and 1:50

Assume all three classes have packets that arrived just now. We're at the start of a congestion period. Which packet gets sent first? again, its the one that's the fastest to send, but based on the m1 rates for these classes for the first 100ms of congestion. During an initial 100ms period, the class 1:30 can send at 75% of the link rate, the default queue 1:40 can send at 20%, and the low-priority queue can send at 5% speed. This means roughly that if all the packets are similar sized, and there are say 100 packets backlog in each class, the high priority queue will send 75 packets the medium will send 20 packets and the low priority will send 5 packets in a given chunk of time.. provided that chunk is less than 100ms and provided the queue doesn't drain. After the 100ms the rates change, high priority has to slow down to 20%, default priority can send at 60% and low priority can send at 20%.

Lets say the burst speed lets the high priority queue drain, now only default and low priority are competing. The default queue will send 20/5 times as many packets as the low priority. If default drains its queue, then low priority gets 5/5 = 100% of the remaining bandwidth.

And this brings up a good point: limits only apply when there is competition between queues. If default is the only thing that has traffic to send, it gets 100% of the bandwidth. And in general if there are several queues the amount queue i gets is bw[i]/sum(bw[j] for all j).

Once you understand how HFSC really works, which is not trivial without the help of the tutorials... You can design a very good queueing system that handles truly real-time traffic like VOIP or robot feedback control or even games, while also giving good service performance in link-sharing. The link-share definition alone even without the real time, tends to keep latency down on the high priority queue at the expense of increasing latency for traffic that doesn't care, such as your large download of a gigabyte operating system install image or whatever.

Note that if you do a hierarchy that's deeper, only the *leaf* classes get real-time service. It really makes sense to have all your real time classes directly attached to the root class 1:1


Inbound QoS with virtual ethernet and policy routing

2017 December 11
by Daniel Lakeland

Starting in mid october I began working on many projects involving computer networking. When you have a fairly complicated set of computing services you just need to do maintenance on them, and also inevitably there's something wrong with some of them that needs fixing, but it takes concentrated time, so you put it off...

Anyway, I made a bunch of things better, including my SIP based VOIP system, and my internal computer network in my house which is now ipv6 only except for certain legacy devices. Ipv6 solves much of the problems that SIP has with NAT, and so this was an overall win. In the process of all of it, I spent some time updating my routers, and got involved with the LEDE project (a fork of OpenWRT). In the process of that I started helping someone on the forum understand QoS, and learned something about setting up QoS myself.

The problem:

Inbound QoS is normally considered "impossible" because once you have a packet you can't keep it from being sent to you... and so the best thing to do is just forward it as quickly as possible. This is not true. In fact it relies on a mistaken idea of what QoS is all about. QoS is about increasing the "utility" of your network by selectively slowing down things that you care less about so that things that you care more about go faster than they would if you didn't slow down those unimportant things.

The "value" of a stream of packets is related to how important they are to you, not how many of them you get per second. If a download you plan to run all night while you sleep is slowed down so your voice conversation is crystal clear, even though you're passing fewer packets over the wire per second, your quality of service increases.

Inbound QoS in Linux:

In Linux, the receive queue has far fewer features than the send queue on an interface. Also, the receive queue has to make decisions before Linux has sent the packet through iptables and had a chance to use marking or DSCP setting iptables rules (such as in the mangle/PREROUTING table). The logical thing to do with inbound QoS is to put it in the router that routes to the wide internet, and run the packets through the iptables and then put the packets into the LAN output queue with appropriate tags and all the capabilities of the output queue. The problem comes when you have a router with several interfaces. For example maybe it has two separate LANs or it has 2 wifi interfaces and an ethernet interface all bridged together. You can't control the total bandwidth by setting limits on the individual interfaces. You want to control the total bandwidth though, because this is what comes in your WAN pipe.


Suppose you have a bridged LAN with 3 physical interfaces, such as you might on a LEDE/OpenWRT wifi router. Let's call the interfaces wlan0, wlan1, and eth0 and they're bridged into a bridge called br-lan. Let's suppose there's also an interface "wan" where packets come in from the internet.

We can force packets received on wan to be routed to a virtual ethernet pair veth0 and veth1. This is based on where they're received from, and so is part of Linux policy routing. The way a veth works is that anything sent to one of them immediately is received by the other as if they were a pair of ports on your computer with an ethernet patch cable between them. To set this up we can do something like:

ip link add type veth ## by default creates veth0 and veth1
ip link set veth1 promisc on ## might be unnecessary
ip link set veth1 master br-lan ## bridge veth1 into the lan

ip route add default dev veth0 table 100 ## send everything that uses table 100 to veth0
ip rule add iif wan table 100 priority 100 ## if it comes in wan interface use table 100 for routing

Now packets that come in wan go through the iptables where we can have -j DSCP rules that tag them with DSCP values describing their importance, then the packets hit the routing code, the code looks up the rule priority 100 and sees it applies so it uses table 100 to route the packets, and table 100 says to send everything down veth0 towards veth1. Since the packets have been through the iptables before routing, by the time they hit veth0 we can for example use fireqos to set up the output of veth0 to have queues which shape the traffic, in particular they delay and/or drop the traffic we don't care about as much which has less important DSCP tags.

There are some tricks here though. The bridge itself will send packets through the iptables again if we have certain sysctls enabled. This may filter the traffic so that it doesn't go from veth1 into the bridge. If you don't want that you need in /etc/sysctl.conf

net.bridge.bridge-nf-call-arptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0

Or if you do want it, you need to set up your iptables to allow the appropriate traffic.

The delay/dropping of packets that are low priority is critical because we may not be able to stop people from sending us packets we've already received, but because of TCP window and the ACK feedback, a short time in the future, if we haven't sent back ACK packets, they will slow down their sending rate. So with TCP in the mix, shaping your inbound packet stream results in feedback going back to the sender in the form of delays in acknowledgement and then slowdowns on their send rate. Explicit congestion notification can also help here. So it's not good to just forward your packet as fast as possible on inbound... you want to give that feedback in the form of delayed ACKs to tell your other party to slow down and open up the channel because you're using some of it for something else.

The result is maybe the very beginning of your voice call might have a little garbled audio, but after say 1 second the delays on your download caused at the veth0 output queue result in your download partner throttling the download, and then your audio packets can come in and will be fine for the rest of the call. Similarly for latency sensitive games, providing you're tagging appropriately.

Tagging your high priority traffic with DSCP tags as it comes in your router is also a good idea because downstream inside your LAN you may have managed switches or wifi access points which have WMM queues and they will interpret the higher priority DSCP tags to mean they should provide basic QoS as well. (note all 802.11n or later, including 802.11ac access points use WMM for prioritizing different traffic, this also changes the radio-collision algorithms used)

Because of the way linux wifi drivers work, it's a good idea to use DSCP tag CS6 = decimal 48 on your highest priority traffic, such as voice RTP traffic. This isn't the standard, the standard for voice seems to be EF = decimal 46 but this doesn't in general cause Linux drivers to use the WMM Voice queue. So setting up DSCP 48 on your softphones and retagging to 48 as packets come in from the internet is probably best.

Hopefully this helps some people. I'll be linking this post to FireQOS issues so people can find it that way.


Something interesting happened last May in ipv6 land

2017 November 8
by Daniel Lakeland

Here's a chart from Akamai:

Akamai ipv6 in the US

I think this might have had to do with Charter / Spectrum or whatever they are called now after all the mergers. They enabled native IPv6 on their home cable internet services some time around there.

What this tells me is that more than 50% of the US traffic will be IPv6 by some time in 2018, and that's a good thing, because it means we'll see a rapid abandonment of ipv4 as it becomes a minority share.

*rubs hands together and cackles maniacally*


IPv6 only networks: experiences

2017 November 6
by Daniel Lakeland

T-Mobile has been ipv6 only on their mobile network, at least for Android phones, for quite some time now, at least a year or more. Apple required that all apps in their app store be IPv6 only read by June 2016 or so... the IPv6 transition is in full swing right now. Here's the traffic graph from Google:

Google IPv6 traffic as of 2017-11-06

IPv6 traffic at google globally as of 2017-11-06

I use SIP based phones, and to tell you the truth, they have a lot of problems... most of which are caused by the non-end-to-end nature of modern ipv4. NAT means that if a SIP server needs to open a connection and tell your phone what to do, such as that the other end is putting the call on hold, you are screwed. Instead, your phone has to keep a connection open to the server at all times, and so even a half-second glitch in your wifi can reset TCP connections and lead to dropped calls. (this is particularly true for SIP over TCP or more importantly SIP over TLS... who wants coffee-shop snoopers to record phone calls? Well it's easy with wireshark and an unencrypted SIP call so use TLS + SRTP).

Anyway, other things don't work that well over ipv4 either: ever wanted to reach your home computer from a remote location and pull a file off of it? It's basically impossible. Similarly for screen-sharing/remote control. There are tons of hacks that kinda make this work, but they're insecure and usually involve giving a totally unknown third party control over your computers (some cloud provider) or port-forwarding, and a lot of other hackishness. Third parties employ people, some of whom get disgruntled. Some of whom stalk their disgruntled exes online... Think I'm joking? I'm not. Read the news:  so "fixing" the ipv4 brokenness by buying a "cloud service" that lets you connect to your home network through some proxy server run by Google/Samsung/CloudsRUs or whatever is a terrible idea.

So, it makes lots of sense to want to eliminate IPv4 as soon as possible: for one thing, it makes the internet work again, for another, it's much easier to administrate, for a third, it means you don't have two configurations to manage, finally, it just supports more devices on your network in a trivially easy way, and there are lots of devices these days.

In order to do this, you need to map the entire IPv4 internet into a tiny little corner of the IPv6 internet, and have a translator. This is called NAT64 and DNS64. Google provides a DNS64 service:  so that takes care of that, you just need to advertise their servers, or have a caching server that uses them as upstream. And, to do the protocol mapping, you need Tayga on your router:  problem solved. This is the technique that T-Mobile uses to make things relatively transparent for your phone.

So, here are my experiences with turning off ipv4 entirely:

  1. SIP calls were broken initially. Unfortunately none of the sip providers do ipv6, so you still need ipv4 connectivity, and more to the point many ATAs don't do IPv6 so that becomes broken. The way to fix this is via "topology hiding" at a sip proxy like Opensips. I'll give a detailed explanation elsewhere. Once I got this fixed, calls drop a lot less and generally things work better, including better audio quality (no doubt all that carrier grade NAT reduces real-time performance of the switching gear in your ISP, and ipv6 bypasses all that).
  2. NFS4 with kerberos encryption just works exactly the way it should. I can access my fileserver from anywhere on the ipv6 web.
  3. I still can't figure out how to get OpenVPN to push ipv6 routes to clients properly. But it does work in so far as it recognizes the ipv6 commands and spits out meaningful logs. I just didn't get the config right.
  4. FireTV Stick is flaky. It seems like if you're broadcasting a DNS entry on your router advertisements, the FireTV stick decides it doesn't have an internet connection. This is true even if you turn on ipv4 for it and hand it a DHCP lease. Hmm... but if you leave it alone for several hours, magically it starts working fine. Clearly a software update is in order Amazon.
  5. Kindle Fire tablets work fine. Flawlessly for all their basic functions.
  6. Minecraft pocket edition on the Kindle Fire works fine on its own, but doesn't do any LAN games. If you give it an ipv4 it works for LAN games. Sigh. This is a known issue apparently. I suspect it will get better as the new, one month old code base gets ironed out. Also, using a server is recommended, but the new code base broke all the servers. Wait a few months if you play minecraft a lot (or your kids do, whatever). New code base seems to demand an XBox One login even if you want to play on your own server. No thanks Microsoft. I consider these things bugs, or worse yet the XBox login thing is malicious spyware behavior.
  7. Linux desktops, MacOS desktops, and Windows Desktops all work totally flawlessly, including raspberry pi. My wife who is a mac user, didn't know I turned off ipv4 until a week afterwards when I told her. Of course she knew I broke the phone system, but I've done that before. Her Mac just worked. There were a few glitches but they turned out to be unrelated (Right about the same time, Google wanted her to upgrade from the "Drive" client to the "Sync" client... so her Drive stopped working, but not because ipv6 just because of google's new system).
  8. Android phones work flawlessly. In fact they seem to work better under ipv6 only to tell you the truth. I think this is the end-to-end nature of the network, no NAT bullshit breaking your communications.

So, if you want to replicate this experiment and ditch the broken old internet protocol here's what you need to replicate my setup:

  • A router running linux.
  • Firehol firewall software.
  • DNSmasq LAN management software (does DNS caching, DHCP, SLAAC, router advertisements, DHCPv6, local DNS for your LAN, etc).
  • Tayga on the router to do NAT64
  • The google DNS64 server info on the above linked page.
  • Some knowledge of how ipv6 works.
  • An ISP that provides ipv6 natively: ATT, Comcast, Spectrum/Charter/TWC, and Cox all do this to a substantial degree.
  • Wide DHCPv6 client to request prefixes from your ISP
  • Alternatively, get a router that runs LEDE (the more up-to-date project that came from OpenWRT). It handles ipv6 pretty much flawlessly out of the box, I just have slightly more requirements for my router than average, so I run it on an Intel machine running Debian.