The Build Out Line in US Youth Soccer: A Terrible Idea full of unintended adverse consequences

2018 July 24
by Daniel Lakeland

The US Soccer Federation has created a new set of rules for youth games. These games are played on reduced sized fields with sides of 7 players (7 on 7). The rules include a new line on the field called the "build out line" which is half-way between the top of the goalkeeper box and the midfield line. When the goalkeeper receives the ball in his hands, the rule is that play must stop, and the opponents must move out past the build-out line before the goalkeeper can re-start play. I've seen websites claiming that a quick-restart is still allowed, but the rules adopted from US Soccer by CalSouth are actually explicit that this is not allowed

"once the opposing team is behind the build out line, the goalkeeper can pass, throw or roll the ball into play (punts and drop kicks are not allowed)" (bold emphasis is mine, italics is theirs)

Now, whenever you create a rule in a game, you modify the tactics that work well in that game. The game is the rules and everything that the rules imply about what will work well to win. The point of a game is to make an honest attempt to win. If we're just trying to work out certain limited skills, it's called practice. Furthermore, the players want to do a good job at the game. The players do not want to be obliterated, nor to lose opportunities to score goals "for no reason" (ie. if the rules don't prohibit their tactic).

This new rule does several things to change the effective tactics available, all of them bad:

  1. It eliminates the role of the goalkeeper in breaking up the attack and coordinating a counterattack, forcing the keeper to wait and hold the ball until the opposing team has built up a defense rather than look for the best opportunity and method to distribute the ball. Quick and accurate distribution of the ball to the most well positioned players is one of the primary functions of the keeper in real soccer.
  2. It eliminates the threat of a long-ball by prohibiting punts and drop-kicks, allowing the defending team to push their entire team up to the build-out line to trap the ball in the back third and provide a high pressure defense that quickly turns over into offense in a dangerous area. The reason the defending team can build up to the build-out line is because they don't have to worry about a long ball. The basic concept of tactics is to tune your action to the situation. Tactically, if you think there's no long ball possibility, you can put all the players on front-line defense. The proper response to that tactic by the offense is actually ... the long ball. A game-theoretic correct response is therefore to figure out a hack around the rule prohibiting punts... Namely drop the ball to a fullback and have him punt it. Or take a "goal kick" type kick from the top of the box, neither of which are prohibited, but both of which are completely artificial.
  3. The build-out line is also used as the line past which offside can be called, allowing a team to cherry-pick goals by leaving players essentially a few yards above the top of the box, or for players to sprint ahead on breaks and receive passes well into the defending side's half without being called offside.

All of these tactics were immediately figured out as soon as this rule was put in place, as you can see in the comment section on ussoccer

For example Ryan Gilbert says:

I am a coach of U9 soccer in NJ. Firstly, congratulations on creating a entirely new game at your board meeting. Yesterday was our first game of the season and it was full of penalties and free kicks in front of goal due to this so called 'build-out line'. 30 mins of our 50 min game were actually soccer and all the goals on both teams came from attack taking advantage of the new 'build-out' rules. What a total calamity you've created, a travesty to the beautiful game


All the opposition needs to do is flood the build-out line and to exert maximum pressure on their defense and the ball never leaves the half way line. I don't know what game this is you've created but it's not soccer. This season is now a farce and it must be scrapped as soon as possible.
This is exactly the dynamic I witnessed at my kids soccer tournament this weekend, and which you can see repeatedly called out in other message boards and venues across the internet. Note that this "build-out" line disappears after 2 years of play (under 9 and under 10 are the only ages that have it).
The build-out line is a travesty, it's an amplifier for high pressure attacking teams as they now no longer ever need to defend a long ball, or get into any kind of defensive position, the whole game can be played in the back third of their opponents side. Nor do they really have to take accurate shots on goal. If you don't have enough support, just kick it to the goalkeeper, he'll hold it for you while you build up enough attacking strength at the build-out line... I'm not normally a user of profanity, and when I do use it I really mean it. So, in plain simple terms fuck that rule.
The dynamic that plain and simple does occur is: Whenever the opponent keeper gets the ball, the previously attacking team gets a break to set up a massive high pressure attack just outside the box, lines up like the charge of the light brigade and if 3 on 1 can't win the ball, all you really have to do is slam the ball off an opponent into touch: now you have a throw in right next to the goal and can kick the next goal... over and over.
This rule resulted in something like 12 or 15 goals against my sons' team this weekend. My son is learning tactics of the real game from a good and experienced coach, and it's making his team lose dramatically. Imagine how that makes the players feel? They're asking themselves "why are we doing so badly? what are we doing wrong? how come we can't get the ball out of our side?"
Thanks US Soccer for this "development" initiative. You certainly are developing something, possibly a steaming pile of unintended consequences?

Grant Money For Nothing

2018 May 17
by Daniel Lakeland
Now look at them yo-yo's that's the way you do it
You play the lottery on the point oh five
That ain't workin' that's the way you do it
Grant money for nothin', academic chits for free
Now that ain't workin' that's the way you do it
Lemme tell ya them guys ain't dumb
Maybe get a blister on your control-c finger
Maybe get a blister on your mousing thumb
We got to collect that carefully measured data, controls for all the conditions we see.
We got to run tests with simulated data, run long chains of MCMC...

Understanding HFSC in Linux QoS

2018 January 16
by Daniel Lakeland

So, over the last few years I've been working on getting VOIP phone systems working as flawlessly as possible. It's a surprisingly difficult thing to do. There are many pieces of the puzzle that can go wrong.

One of the important pieces of the puzzle is the network behavior at your router. The issue is that ideally for each call that comes through every 0.02 seconds a new audio packet will arrive and it should be delivered to the phone. If it's delayed by more than 0.02 seconds, it's basically worthless as it's too late to play that bit of audio anyway, the next audio bit needs to be played. This isn't quite true, because SIP phones use jitter buffers, so if once in a while you get some delays it can recover. But jitter buffers add to the overall phone call delay, so with too big of a jitter buffer, conversation participants start to talk over each other. This is a particular problem when calling between VOIP systems and cell phones as cell phones often have fairly long delays.

Well, so far, I've been using FireQOS to set up a quality of service system that manages the queue of packets on my router. It's a pretty nice system, and it does fairly well. But it isn't perfect. In particular, it uses the hierarchical token bucket (HTB) qdisc on Linux. This particular algorithm shares the bandwidth between different classes of traffic using a particular scheme that allows certain classes to borrow bandwidth from other classes that aren't using it.

An alternative is the Hierarchical Fair Service Curve (HFSC) qdisc, and in general, this is much more poorly understood. However I discovered a great set of resources about it, and after reading that explanation I tried it out. The result was substantially more reliable low-jitter connections. Here are the resources: Stackexchange Thread and Tutorial

Here's a very useful scheme you can set up. In addition to this set of classes, you need to also have filters that select which class each packet goes into. That's a separate issue. But suppose you can select your voip packets to go into 1:10, and game packets into 1:20 and say interactive packets like ssh or ping or maybe video streams into 1:30, by default everything goes to 1:40 and packets for long-running file transfers go into 1:50...


tc qdisc del dev ${DEV} root
tc qdisc add dev ${DEV} handle 1: root hfsc default 40
tc class add dev ${DEV} parent 1: classid 1:1 hfsc ls m2 ${BW}kbit ul m2 ${BW}kbit

## voip class
tc class add dev ${DEV} parent 1:1 classid 1:10 hfsc rt m1 $((250*8*$NCALL*2/5))kbit d 5ms m2 $((250*8*$NCALL/20))kbit
## games class
tc class add dev ${DEV} parent 1:1 classid 1:20 hfsc rt  m1 $((250*8*$NCALL*2/5))kbit d 5ms m2 $((250*8*$NCALL/20))kbit

## above classes get pfifo behavior 

## LS classes, interactive
tc class add dev ${DEV} parent 1:1 classid 1:30 hfsc ls  m1 $(($BW * 75/100))kbit d 100ms m2 $(($BW * 2/10))kbit
tc qdisc add dev ${DEV} parent 1:30 handle 30: fq_codel
## default
tc class add dev ${DEV} parent 1:1 classid 1:40 hfsc ls  m1 $(($BW * 20/100))kbit d 100ms m2 $(($BW * 6/10))kbit
tc qdisc add dev ${DEV} parent 1:40 handle 40: fq_codel
## lowprio
tc class add dev ${DEV} parent 1:1 classid 1:50 hfsc ls  m1 $(($BW * 5/100))kbit d 100ms m2 $(($BW * 2/10))kbit
tc qdisc add dev ${DEV} parent 1:50 handle 50: fq_codel

Now, this sets up a top-level HFSC class called 1:1 which has a maximum of $BW output bandwidth. It then sets up 2 real-time queues, and 3 link sharing queues to apportion the bandwidth. I assume 100kbit/s for a phone call, and a similar amount for a "game". I currently am not using the game class.

The voip class is set up using the specification

rt m1 $((250*8*$NCALL*2/5))kbit d 5ms m2 $((250*8*$NCALL/20))kbit

This means "real time" class, with a burst rate of 800*N kbit/s for up to 5milliseconds, and a steady rate of 100 N kbit/s. How does this work?

First off, if I understand this correctly, whenever there is traffic in real-time classes, all link share classes have to wait. Because of that, it makes sense to make sure that the total amount reserved for real-time is a smallish fraction of your total budget. In other words, you can't really expect VOIP calls to work great if you have 1Mbit of upstream speed or less because 800*2 = 1600 kbits/s which exceeds your 1000 kbit/s available. Now, what's the "d 5ms" mean? It means that this "burst" speed is only available for up to 5ms. The idea here is that if your have N calls, and there are 2 packets in the queue per call, you can completely drain the queue by 5ms later. The speed I calculated is: 250 bytes * 8 bits/byte * N calls * 2 packets/call / 5ms. But, we don't want this to go on forever, the goal is just to keep the queues short by draining them quickly. Long run, only 250*8*N/20 kbit/s are in use.

The "burst" speed time-period begins at the point where competition for sending occurs, in other words, when there is more than one real-time queue with a packet. HFSC looks at all the real-time queues and determines which one has the *quickest* packet to send, and sends that. The time it takes to send a packet is packetLength / burstSpeed. Real time packets get bandwidth until one of two things happens: they exceed their total sending allotment, or they drain their queues.

For real time, the total amount you can have sent from time t = 0 to time T=now is m2 * T. You can think of this as a line that increases in time at slope m2. If you had a long period with no calls going on, then the amount of bytes you have sent so far is much smaller than m2*T. This means whenever a packet comes in, it gets sent at the m1 rate until the queue drains. Since the m1 rate drains the queue much faster than it fills up from new VOIP packets. We basically ensure that as each VOIP packet arrives, it gets sent very very quickly and we rarely have more than a few packets in the queue.

Suppose you wanted to use the games class? If you care more about VOIP than games, as you probably should. You'd want to make sure that the m1 for VOIP calls was several times faster than the m1 for games. This means several packets would be sent for VOIP for every packet sent for your game. Let's say for example you need 500 kbit/s for your games, and you want to be able to send 3 game packets within 5ms during burst to drain backlogs. Game packets are say 100 bytes. The m1 speed should be 100*8*3/5 =  480 kbit/s. If you want to send at least 3 VOIP packets PER CALL for each game packet, with 250 byte voip packets, you make m1 = 480*N * 3 * 250/100 = 3600 *N kbit. The m2 rate stays the same, it is what limits the real-time class from completely using ALL the available bandwidth. It's worth pointing out that what matters is really the *ratio* of speeds between the competing classes, not the absolute speed of any class.

Now, what about link-sharing?

Link sharing always goes after real-time, either because real-time queues were totally drained, or because real-time hit its limiting total amount to be sent based on m2 and the time from boot-up to now.

Assuming real-time has drained, let's look at the classes 1:30 1:40 and 1:50

Assume all three classes have packets that arrived just now. We're at the start of a congestion period. Which packet gets sent first? again, its the one that's the fastest to send, but based on the m1 rates for these classes for the first 100ms of congestion. During an initial 100ms period, the class 1:30 can send at 75% of the link rate, the default queue 1:40 can send at 20%, and the low-priority queue can send at 5% speed. This means roughly that if all the packets are similar sized, and there are say 100 packets backlog in each class, the high priority queue will send 75 packets the medium will send 20 packets and the low priority will send 5 packets in a given chunk of time.. provided that chunk is less than 100ms and provided the queue doesn't drain. After the 100ms the rates change, high priority has to slow down to 20%, default priority can send at 60% and low priority can send at 20%.

Lets say the burst speed lets the high priority queue drain, now only default and low priority are competing. The default queue will send 20/5 times as many packets as the low priority. If default drains its queue, then low priority gets 5/5 = 100% of the remaining bandwidth.

And this brings up a good point: limits only apply when there is competition between queues. If default is the only thing that has traffic to send, it gets 100% of the bandwidth. And in general if there are several queues the amount queue i gets is bw[i]/sum(bw[j] for all j).

Once you understand how HFSC really works, which is not trivial without the help of the tutorials... You can design a very good queueing system that handles truly real-time traffic like VOIP or robot feedback control or even games, while also giving good service performance in link-sharing. The link-share definition alone even without the real time, tends to keep latency down on the high priority queue at the expense of increasing latency for traffic that doesn't care, such as your large download of a gigabyte operating system install image or whatever.

Note that if you do a hierarchy that's deeper, only the *leaf* classes get real-time service. It really makes sense to have all your real time classes directly attached to the root class 1:1


Inbound QoS with virtual ethernet and policy routing

2017 December 11
by Daniel Lakeland

Starting in mid october I began working on many projects involving computer networking. When you have a fairly complicated set of computing services you just need to do maintenance on them, and also inevitably there's something wrong with some of them that needs fixing, but it takes concentrated time, so you put it off...

Anyway, I made a bunch of things better, including my SIP based VOIP system, and my internal computer network in my house which is now ipv6 only except for certain legacy devices. Ipv6 solves much of the problems that SIP has with NAT, and so this was an overall win. In the process of all of it, I spent some time updating my routers, and got involved with the LEDE project (a fork of OpenWRT). In the process of that I started helping someone on the forum understand QoS, and learned something about setting up QoS myself.

The problem:

Inbound QoS is normally considered "impossible" because once you have a packet you can't keep it from being sent to you... and so the best thing to do is just forward it as quickly as possible. This is not true. In fact it relies on a mistaken idea of what QoS is all about. QoS is about increasing the "utility" of your network by selectively slowing down things that you care less about so that things that you care more about go faster than they would if you didn't slow down those unimportant things.

The "value" of a stream of packets is related to how important they are to you, not how many of them you get per second. If a download you plan to run all night while you sleep is slowed down so your voice conversation is crystal clear, even though you're passing fewer packets over the wire per second, your quality of service increases.

Inbound QoS in Linux:

In Linux, the receive queue has far fewer features than the send queue on an interface. Also, the receive queue has to make decisions before Linux has sent the packet through iptables and had a chance to use marking or DSCP setting iptables rules (such as in the mangle/PREROUTING table). The logical thing to do with inbound QoS is to put it in the router that routes to the wide internet, and run the packets through the iptables and then put the packets into the LAN output queue with appropriate tags and all the capabilities of the output queue. The problem comes when you have a router with several interfaces. For example maybe it has two separate LANs or it has 2 wifi interfaces and an ethernet interface all bridged together. You can't control the total bandwidth by setting limits on the individual interfaces. You want to control the total bandwidth though, because this is what comes in your WAN pipe.


Suppose you have a bridged LAN with 3 physical interfaces, such as you might on a LEDE/OpenWRT wifi router. Let's call the interfaces wlan0, wlan1, and eth0 and they're bridged into a bridge called br-lan. Let's suppose there's also an interface "wan" where packets come in from the internet.

We can force packets received on wan to be routed to a virtual ethernet pair veth0 and veth1. This is based on where they're received from, and so is part of Linux policy routing. The way a veth works is that anything sent to one of them immediately is received by the other as if they were a pair of ports on your computer with an ethernet patch cable between them. To set this up we can do something like:

ip link add type veth ## by default creates veth0 and veth1
ip link set veth1 promisc on ## might be unnecessary
ip link set veth1 master br-lan ## bridge veth1 into the lan

ip route add default dev veth0 table 100 ## send everything that uses table 100 to veth0
ip rule add iif wan table 100 priority 100 ## if it comes in wan interface use table 100 for routing

Now packets that come in wan go through the iptables where we can have -j DSCP rules that tag them with DSCP values describing their importance, then the packets hit the routing code, the code looks up the rule priority 100 and sees it applies so it uses table 100 to route the packets, and table 100 says to send everything down veth0 towards veth1. Since the packets have been through the iptables before routing, by the time they hit veth0 we can for example use fireqos to set up the output of veth0 to have queues which shape the traffic, in particular they delay and/or drop the traffic we don't care about as much which has less important DSCP tags.

There are some tricks here though. The bridge itself will send packets through the iptables again if we have certain sysctls enabled. This may filter the traffic so that it doesn't go from veth1 into the bridge. If you don't want that you need in /etc/sysctl.conf

net.bridge.bridge-nf-call-arptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0

Or if you do want it, you need to set up your iptables to allow the appropriate traffic.

The delay/dropping of packets that are low priority is critical because we may not be able to stop people from sending us packets we've already received, but because of TCP window and the ACK feedback, a short time in the future, if we haven't sent back ACK packets, they will slow down their sending rate. So with TCP in the mix, shaping your inbound packet stream results in feedback going back to the sender in the form of delays in acknowledgement and then slowdowns on their send rate. Explicit congestion notification can also help here. So it's not good to just forward your packet as fast as possible on inbound... you want to give that feedback in the form of delayed ACKs to tell your other party to slow down and open up the channel because you're using some of it for something else.

The result is maybe the very beginning of your voice call might have a little garbled audio, but after say 1 second the delays on your download caused at the veth0 output queue result in your download partner throttling the download, and then your audio packets can come in and will be fine for the rest of the call. Similarly for latency sensitive games, providing you're tagging appropriately.

Tagging your high priority traffic with DSCP tags as it comes in your router is also a good idea because downstream inside your LAN you may have managed switches or wifi access points which have WMM queues and they will interpret the higher priority DSCP tags to mean they should provide basic QoS as well. (note all 802.11n or later, including 802.11ac access points use WMM for prioritizing different traffic, this also changes the radio-collision algorithms used)

Because of the way linux wifi drivers work, it's a good idea to use DSCP tag CS6 = decimal 48 on your highest priority traffic, such as voice RTP traffic. This isn't the standard, the standard for voice seems to be EF = decimal 46 but this doesn't in general cause Linux drivers to use the WMM Voice queue. So setting up DSCP 48 on your softphones and retagging to 48 as packets come in from the internet is probably best.

Hopefully this helps some people. I'll be linking this post to FireQOS issues so people can find it that way.


Something interesting happened last May in ipv6 land

2017 November 8
by Daniel Lakeland

Here's a chart from Akamai:

Akamai ipv6 in the US

I think this might have had to do with Charter / Spectrum or whatever they are called now after all the mergers. They enabled native IPv6 on their home cable internet services some time around there.

What this tells me is that more than 50% of the US traffic will be IPv6 by some time in 2018, and that's a good thing, because it means we'll see a rapid abandonment of ipv4 as it becomes a minority share.

*rubs hands together and cackles maniacally*


IPv6 only networks: experiences

2017 November 6
by Daniel Lakeland

T-Mobile has been ipv6 only on their mobile network, at least for Android phones, for quite some time now, at least a year or more. Apple required that all apps in their app store be IPv6 only read by June 2016 or so... the IPv6 transition is in full swing right now. Here's the traffic graph from Google:

Google IPv6 traffic as of 2017-11-06

IPv6 traffic at google globally as of 2017-11-06

I use SIP based phones, and to tell you the truth, they have a lot of problems... most of which are caused by the non-end-to-end nature of modern ipv4. NAT means that if a SIP server needs to open a connection and tell your phone what to do, such as that the other end is putting the call on hold, you are screwed. Instead, your phone has to keep a connection open to the server at all times, and so even a half-second glitch in your wifi can reset TCP connections and lead to dropped calls. (this is particularly true for SIP over TCP or more importantly SIP over TLS... who wants coffee-shop snoopers to record phone calls? Well it's easy with wireshark and an unencrypted SIP call so use TLS + SRTP).

Anyway, other things don't work that well over ipv4 either: ever wanted to reach your home computer from a remote location and pull a file off of it? It's basically impossible. Similarly for screen-sharing/remote control. There are tons of hacks that kinda make this work, but they're insecure and usually involve giving a totally unknown third party control over your computers (some cloud provider) or port-forwarding, and a lot of other hackishness. Third parties employ people, some of whom get disgruntled. Some of whom stalk their disgruntled exes online... Think I'm joking? I'm not. Read the news:  so "fixing" the ipv4 brokenness by buying a "cloud service" that lets you connect to your home network through some proxy server run by Google/Samsung/CloudsRUs or whatever is a terrible idea.

So, it makes lots of sense to want to eliminate IPv4 as soon as possible: for one thing, it makes the internet work again, for another, it's much easier to administrate, for a third, it means you don't have two configurations to manage, finally, it just supports more devices on your network in a trivially easy way, and there are lots of devices these days.

In order to do this, you need to map the entire IPv4 internet into a tiny little corner of the IPv6 internet, and have a translator. This is called NAT64 and DNS64. Google provides a DNS64 service:  so that takes care of that, you just need to advertise their servers, or have a caching server that uses them as upstream. And, to do the protocol mapping, you need Tayga on your router:  problem solved. This is the technique that T-Mobile uses to make things relatively transparent for your phone.

So, here are my experiences with turning off ipv4 entirely:

  1. SIP calls were broken initially. Unfortunately none of the sip providers do ipv6, so you still need ipv4 connectivity, and more to the point many ATAs don't do IPv6 so that becomes broken. The way to fix this is via "topology hiding" at a sip proxy like Opensips. I'll give a detailed explanation elsewhere. Once I got this fixed, calls drop a lot less and generally things work better, including better audio quality (no doubt all that carrier grade NAT reduces real-time performance of the switching gear in your ISP, and ipv6 bypasses all that).
  2. NFS4 with kerberos encryption just works exactly the way it should. I can access my fileserver from anywhere on the ipv6 web.
  3. I still can't figure out how to get OpenVPN to push ipv6 routes to clients properly. But it does work in so far as it recognizes the ipv6 commands and spits out meaningful logs. I just didn't get the config right.
  4. FireTV Stick is flaky. It seems like if you're broadcasting a DNS entry on your router advertisements, the FireTV stick decides it doesn't have an internet connection. This is true even if you turn on ipv4 for it and hand it a DHCP lease. Hmm... but if you leave it alone for several hours, magically it starts working fine. Clearly a software update is in order Amazon.
  5. Kindle Fire tablets work fine. Flawlessly for all their basic functions.
  6. Minecraft pocket edition on the Kindle Fire works fine on its own, but doesn't do any LAN games. If you give it an ipv4 it works for LAN games. Sigh. This is a known issue apparently. I suspect it will get better as the new, one month old code base gets ironed out. Also, using a server is recommended, but the new code base broke all the servers. Wait a few months if you play minecraft a lot (or your kids do, whatever). New code base seems to demand an XBox One login even if you want to play on your own server. No thanks Microsoft. I consider these things bugs, or worse yet the XBox login thing is malicious spyware behavior.
  7. Linux desktops, MacOS desktops, and Windows Desktops all work totally flawlessly, including raspberry pi. My wife who is a mac user, didn't know I turned off ipv4 until a week afterwards when I told her. Of course she knew I broke the phone system, but I've done that before. Her Mac just worked. There were a few glitches but they turned out to be unrelated (Right about the same time, Google wanted her to upgrade from the "Drive" client to the "Sync" client... so her Drive stopped working, but not because ipv6 just because of google's new system).
  8. Android phones work flawlessly. In fact they seem to work better under ipv6 only to tell you the truth. I think this is the end-to-end nature of the network, no NAT bullshit breaking your communications.

So, if you want to replicate this experiment and ditch the broken old internet protocol here's what you need to replicate my setup:

  • A router running linux.
  • Firehol firewall software.
  • DNSmasq LAN management software (does DNS caching, DHCP, SLAAC, router advertisements, DHCPv6, local DNS for your LAN, etc).
  • Tayga on the router to do NAT64
  • The google DNS64 server info on the above linked page.
  • Some knowledge of how ipv6 works.
  • An ISP that provides ipv6 natively: ATT, Comcast, Spectrum/Charter/TWC, and Cox all do this to a substantial degree.
  • Wide DHCPv6 client to request prefixes from your ISP
  • Alternatively, get a router that runs LEDE (the more up-to-date project that came from OpenWRT). It handles ipv6 pretty much flawlessly out of the box, I just have slightly more requirements for my router than average, so I run it on an Intel machine running Debian.


On IPv6 only networks and FireTV Stick

2017 October 17
by Daniel Lakeland

The FireTV stick from Amazon (I have I think version 2) is a useful bit of kit. However, while it can use ipv6 it seems as of today (FireOS October 2017) it must have an ipv4 connection in addition in order to be happy.

However, even if you give it dual-stack, it can still flake out, and the way it seems to work is it looks like it'll be connected for a few seconds, and then it will show that "home is unavailable" and ask you to check your network settings.

If you go to settings -> about -> network you will see it has an ipv4 address and everything looks fine! what's up? Look closer and I see that it shows a DNS address with ipv6.

The key is that ipv6 can advertise an ipv6 DNS server, and if you aren't also DHCP advertising an ipv4 DNS server, you'll be up a creek.

using dnsmasq on my network, what fixed this for me was:


where 10.x.x.x was my server/router running dnsmasq

If this is there, even if you have an additional:


where ...... is your ipv6 for dnsmasq, firetv will seem to be happy. This seems to be because it overrides its DNS settings if it hears from dnsmasq who the DNS servers are, and if it doesn't have any ipv4 DNS servers, it just refuses to be happy about its network connection...

Hope that helps someone.


Followup on Implicit Function Theorem / Likelihoods

2017 September 25
by Daniel Lakeland

I think it's important to understand conceptually what is going on in these cases where we have an implicit relationship that data and parameters are supposed to follow, and to know when it is that we need to do some kind of Jacobian corrections.

A Jacobian correction is required when you have a GIVEN probability distribution on space A and you have a transformation from A to B, call it B=F(A) and you want to express a probability distribution on the B space which is *equivalent in every way* to the GIVEN distribution on space A. The distribution on B is called the *push forward* distribution on B. The mnemonic here is that if you have a small neighborhood in A and you "push it forward" through the F function into the B space, it produces a small neighborhood in B, and if you want this to be equivalent in every way, then the measure of the neighborhood on A is going to be forced to be equal to the measure of the pushed-forward neighborhood in B.

GIVEN: A ~ DistA(values)


DERIVE: B ~ DistB(values)

This process requires using DistA, the inverse transform Finv(B) and a Jacobian correction.

Compare this to:

UNKNOWN: A ~ UnknownDistro(Avalues)


GIVEN: B ~ GivenDistB(Bvalues)

Here we don't know what measure we have on the A space, but we know (by a modeling assumption) what measure we have on the B space. If F is an invertible function, then this situation is entirely symmetric with the above distribution it's just the case that *which space the distributional information is given in is different*.

Now, let's add some semantics to all of this.

In the above problem let A be a space in which your data measurements live. Let DistA(values) then be p(A | values) a likelihood factor in your model. Here, you know what the distribution is on your data. So, just use it. But if you insist on transforming your data to some other space, like say taking the log of your data, in order to leave your model unchanged by the fact that you insist on taking the log, you will have to find a DistB which is the push-forward measure of DistA through the transformation B=log(A).

Now, suppose you don't know what likelihood to give for your data values, but you know that if you calculate some complicated function B = F(A) you would be willing to model the results, in the B space, as having a distribution p(B|Parameters) = DistB(Parameters)

Now, if you want to know what measure this implies in the data space, you will have to do the whole change of variables rigamarole with Jacobians. The important thing to understand is *what is given vs what is derived*

Now, let's imagine a situation where you have a non-separable relationship between various data and parameters, which is constant plus error, a typical situation where the implicit function theorem applies. Here x,y are data, a,b,c are parameters in your model, and we'll assume F is a "nice" function of the kind you're likely to write down as part of a modeling exercise not something really weird which is nowhere differentiable on any of its inputs or the like. Our model says that there is a relationship between x,y,a,b,c which is a constant plus noise. This relationship will be written:

F(x,y,a,b,c) = 0 + \epsilon

And let's say \epsilon \sim De(C) has GIVEN distribution De(C) where C are some constants (easiest case).

Now suppose that a,b,c have given values, and x,y are measured. Then the quantity on the left of this equation is a number F(x,y,a,b,c)=3.310 for example. And so, 3.310 = \epsilon is data, derived data to be sure, but data nonetheless, for a given a,b,c and measured x,y there is no uncertainty left it's just a number. By MODELING ASSUMPTION the probability that this \epsilon would be calculated to be within d\epsilon of 3.310 if the true values of a,b,c were the ones given by the sampler, is De(3.310|C)d\epsilon where De is a given function.

And so the distribution De(C) is of the form p(\epsilon | a,b,c) it is a *given* likelihood in "epsilon space". Note that x,y are needed to get \epsilon but they are known data values, throughout the sampling process they stay constant. So this is really a function L(a,b,c) where a,b,c are the only things that change while you're sampling. Given the data x,y the post-data distribution on a,b,c is

L(a,b,c) prior(a,b,c)/Z da db dc

Where Z is a normalization factor Z = \int L(a,b,c) prior(a,b,c)da db dc

Now, if you have this given likelihood in epsilon space, and you want to see what the equivalent likelihood is over say y space where we think of y as data we'd like to predict, and x as covariates, and a,b,c as parameter values:

p(y | a,b,c) dy = p(\epsilon(x,y) | a,b,c) \frac{d\epsilon(y)}{dy} dy

Under the assumption that F is sufficiently well behaved that the implicit function theorem gives us a unique differentiable transform from y to epsilon for given x,a,b,c. And d\epsilon(y)/dy is the "Jacobian Correction". Now divide both sides by dy and we have our answer for the density of y (I'm using nonstandard analysis, dy is an infinitesimal number).

The point is, the likelihood is strictly implied to be the push-forward measure of the GIVEN distribution over \epsilon. But the truth is, we don't know the transformation y = f(x,a,b,c,\epsilon) or its inverse. The typical way we'd do predictions would be to set \epsilon_n to be a parameter with the epsilon distribution, and then sample, then we take the \epsilon_n values and use an iterative numerical solver to get y values. And so, now we have a computational criterion for deciding of F is sufficiently nice: it produces a unique answer (you might be able to extend this to a countable number of possible alternative answers) under iterative numerical solution for y from a given x,a,b,c,\epsilon_n.


The Implicit Function Theorem and Likelihood Functions

2017 September 22
by Daniel Lakeland

In Bayesian statistical modeling we often use the symbol ~ which denotes a kind of "statistically equal to". Consider the following:

y = ax+b+\epsilon

If \epsilon = 0 then this is an equation of a line, whereas if we say \epsilon \sim N(0,1) for example then this denotes a line with errors that have a certain range of credible sizes and are centered around 0. Well this statement about the distribution of \epsilon doesn't alter the algebraic properties of the symbolic expression y=ax+b+\epsilon and so that equality still respects all the usual algebraic rules.

y-ax-b = \epsilon

Is true, and so y-ax-b \sim N(0,1) is true by substitution of \epsilon.

In general you might have a fairly complicated relationship, something like

F(x,y,a,b) = \epsilon

With F a nonlinear non-separable relationship between the quantities, for example

y^2 -\frac{x}{y}\mathrm{atan}(ay)+\frac{b}{a} = \epsilon

Or something equally nasty from the perspective of trying say "solve for y". We can suppose that y is our data, and x a covariate and a,b are parameters. What do we make of this relationship in a Bayesian model and how do we use it?

In Stan, if you create a transformed parameter F = y^2-x/y*atan(a*y)+b/a and then say

F ~ normal(0,1)

You will get a message about how the left hand side of this sampling statement contains a transform of a parameter, and if it's nonlinear you need to include the Jacobian of the transformation in the target. This warning message is designed to alert you to something you need to do when you re-parameterize. But a re-parameterization is a purely formal transformation. It doesn't alter the meaning of your model, it alters the way in which the model is expressed. For example if you have y = ax+b and you change this to y/a = x +b/a and then rename y/a = y' and b/a = b' and say y' = x + b', this is a formal transformation that doesn't alter the meaning of the equation (provided a is not 0). On the other hand, if you do y = ax^2 + b then you're changing your model.

The statement F ~ normal(0,1) above is not a formal transformation, it is in fact a statement about your data, a kind of likelihood, it's just an implicit statement about your data.

Although we can't necessarily solve our equation for y symbolically, there is a theorem called the implicit function theorem which enables us to say that as long as our relationship F(x,y,a,b) is sufficiently well behaved, then in some region around any given point x',a',b'  there exists a function y = f(x,a,b) even if we don't know how to express it. For example when the distribution for a is well separated from 0 then we won't be dividing by a=0 and so our expression F is well behaved. And so, our statement

F ~ normal(0,1) is really a statement about

y-f(x,a,b) = \epsilon

And could be re-expressed as

y \sim N(f(x,a,b),1)

Which for y a data value is obviously the usual kind of likelihood expression. The problem is, although this f function exists, that doesn't mean we know what it is. We do, however, know the relationship F(x,y,a,b) and so why not do

F(x,y,a,b) \sim N(0,1)

Which has exactly the same meaning.

Note, to the best of our knowledge we have decided to model \epsilon \sim N(0,1) which is a modeling choice, and subject to questions regarding whether it expresses valid facts about the world more than it is subject to questions about mathematical correctness. This fact isn't derived mathematically from anything, it's assumed and so it should be questioned primarily on modeling grounds more than anything else. There can be mathematical facts that are relevant I suppose, but the main question is "is this a good model" not "did you derive the N(0,1) correctly" since it isn't derived.

All of this is another way to think about what I called "declarative models" a while back when I first started thinking about this topic.


On the lack of Lebesgue Measure on countably infinite dimensional spaces, and Nonstandard Analysis

2017 September 18
by Daniel Lakeland

Consider the interval [0,1] it has length 1. The generalized notion of length in the reals is Lebesgue measure, whenever you have something like a closed interval so that there's a trivial length for a set, then the Lebesgue measure is the same as the length.

Now consider the 2D plane, the square [0,1] \times [0,1] consists of all the points (x,y) where x is in [0,1] and so is y. What is the area? It's 1. This continues to work for integer dimensions 3,4, etc what's the volume of the hypercube [0,1]^N for N some large integer like 3105? Again, it's 1^{3105} = 1.

But now let's see what happens when we consider intervals of the form [0,0.5] the length is 0.5 and for high dimensions N the hyper-volume of the hyper-cube is 0.5^N which goes to zero as N gets big. Similarly for intervals [0,2] the volume goes to 2^N which goes to infinity as N gets big.

Intuitively this is why we don't have (standard) Lebesgue measure on the infinite dimensional space. An infinitesimal interval dx is small, but when you calculate dx^N for N nonstandard, the hyper-volume is REALLY small. Similarly for intervals of slightly larger than side 1, the hyper-volume is infinite.

On the other hand, consider the interval [0,1.1]^N for N a nonstandard integer. Sure, the hyper-volume 1.1^N is nonstandard. But, it's a perfectly fine nonstandard number. If this calculation is an intermediate calculation in a series of calculations that eventually leads you to prove some property, there is nothing that keeps you from carrying it out. For example you want to show that one set is much smaller than another, the ratio of sizes is r = 1.1^N/1.2^N for N nonstandard. This ratio is clearly infinitesimal as 1.1/1.2 \approx 0.916667 is a fraction less than 1 and it's raised to a nonstandard power.

But if you have some other infinitesimal ratio, and we want to discern how big they are relative to each other, for example how big is 0.995^{N-K} relative to (1.1/1.2)^N you can do so easily and algebraically. [0.995^{N-K}/(1.1/1.2)^N] \approx 1.0855^{N-K} \times (1.1/1.2)^{-K}.

When N and K are nonstandard, you rapidly get either an unlimited or an infinitesimal result. But if you prove that this is true for all N,K and then need to later consider the finite standard case say N=1331512 and K=89331 then you have the formula available to you, and you can get a perfectly fine standard value. This is useful if you're doing something like considering a function of space evaluated at a set of points and you don't know ahead of time exactly how many points. For example each point might be the location of competing insects, and you're working out a PDE to approximate how these insect populations change in time. The insects come at discrete locations, but the particulars of how many and which locations are not known ahead of time. You can develop a continuous model, in which you have a smooth function of space, and then you've got an "infinite dimensional" model, but the truth is your infinite dimensional model is just a device for calculations approximating a finite but "large N" number of points. It's not helpful to say that "there is no Lebesgue measure on infinite dimensional space" because the property "there is Lebesgue measure on space of finite dimensions N for all integer N" is the property you care about. In your model you would only actually ever care about say N = a few million to billion. So developing a nonstandard expression makes more sense to the modeler, even though it makes no sense to the pure mathematician trained in classical analysis.