NFS sync/async, some of the issue solved, or how to set vm.dirty_bytes and vm.dirty_background_bytes

2016 October 9
by Daniel Lakeland

tl;dr version: Set your vm.dirty_bytes and vm.dirty_background_bytes sysctl values to be a quantity based on the bandwidth of your ethernet times a reasonable initial delay period, such as a quantity less than 1 second. This prevents your client from increasing the total time to flush a file to disk by dramatically reducing the initial delay before it starts the disk working.

So, the pieces are falling together on this issue. With server mount = async, the test program returns before everything is actually on disk. So, it doesn't actually measure the full time to transfer everything to disk, only the time to get it across the ethernet wire. Test program is:

dd if=/dev/zero of=./MyTestFileOnNFS bs=1M count=1000 conv=fdatasync

That opens the file, writes 1 Gig of zeros and syncs the file. With async on the server, the sync returns immediately before the disk is finished, so of course you wind up getting about 100MB/s which is the bandwidth of the network transfer (gigE).

However, if you put sync on the server, the fsync() at the end of the dd waits until the disk finishes. How fast can the filesystem get files on the disk? With btrfs RAID 10 and my little USB 3.0 enclosure, the above command gives about 66 MB/s on the server itself without NFS in the mix. So, for the moment, we'll suppose that's the max bandwidth the USB drive controller will handle for RAID 10.

Still, over NFS with server=sync, I was getting around 44MB/s on a good run, which is only about 2/3 of the disk maximum, nowhere near the 66 we'd hope for.

Well, it turns out that the kernel will buffer things up in RAM and only start the disks writing after sufficient buffers are full. There are 4 possible sysctl settings involved:


By default, dirty_ratio = 20 and dirty_background_ratio = 10 (and the *bytes options are 0 and therefore ignored, you can only use bytes or ratio but not both, the kernel will zero out the other one when you set one of these options). The way the defaults work is when 10% of the physical RAM is full, the kernel starts flushing to disk in the background, and when 20% is full it starts throttling programs that are writing and flushing to disk aggressively.

Well setting these as a fraction of RAM is just a bad idea, especially as RAM prices decline and RAM on a server can easily be larger than even largish files. So first off go into /etc/sysctl.conf and put in lines like this:

vm.dirty_bytes =
vm.dirty_background_bytes =

But the question is, what to put on the right hand side of the =, how big to make these buffers?

The right way to do this is to think about how much time you're willing to delay starting the flush to disk if a large write comes in? If, for example, you have dirty_background_ratio = 10, and you've got 32 gigs of RAM, then 3.2 GB of data have to come over the wire before the kernel will wake up and flush the disk.... which at gigE speeds is about 256 seconds! Well, instead the kernel timeout of 30 seconds will kick in. But still, if transferring a file takes less than 30 seconds, then the kernel won't even start putting the file on disk before the xfer completes over the wire. If the disk writes a LOT faster than the wire, this doesn't cause a big problem. But when the disk writes about the same speed as the wire... you're basically taking twice as long as if you'd just started writing to the disk right away.

So, instead, think about the quantity: bandwidth*delay. For gigE the bandwidth = 1024*1024*1024/8 bytes/s and a reasonable delay before you wake up to do something about it is probably less than a second. I put 0.25 seconds. And if a buffer builds up representing about 1.5 seconds continuously receiving data you want to flush that disk aggressively. The result is dirty_bytes = 1024^3/8 * 1.5 and dirty_background_bytes = 1024^3/8*0.25

vm.dirty_bytes = 201326592
vm.dirty_background_bytes = 33554432

After putting this in /etc/sysctl.conf and doing "sysctl -p" to actually set it in the kernel on both the client and the server, the dd bandwidth for 1000 MB was about 56MB/s instead of 44 MB/s which is now about 85% of the filesystem's local bandwidth. Not bad. There are probably other issues I can tune but at least we're not sitting around twiddling thumbs for 5 to 30 seconds before servicing large writes.

So, rather than my earlier explanation in which I assumed a whole bunch of very short delays accumulated as each 1MB block was synced to disk, instead what was going on was initially a long wait time, up to 5 seconds (btrfs commit interval that I set) was being added into the total transfer time because the kernel was happy to just put my data into RAM and sit on it until 30 seconds or 1.6 GB or the btrfs 5 second commit interval accumulated. Since I was only transferring 1GB anyway, it makes sense that a lot of delay occurred before writing started, thereby increasing the total time to get things on disk.

There are quite a few articles online telling you about how setting dirty_bytes and dirty_background_bytes is a good idea, but few of them tell you how to think about the selection of the buffer sizes. They tend to say things like "try setting it to 100MB and see how that works..." well. By thinking in terms of bandwidth*delay product you can make a principled guess as to what would be a good size. If you have 10GigE or are bonding 3 GigE ports or just have old 100 Mbps ethernet or whatever, you'll need to choose completely different quantities, but it will still be a good idea to pick a reasonable delay, and start tuning with your bandwidth * delay number.


NFS sync/async still a mystery

2016 October 9
by Daniel Lakeland

It turns out that the "sync on writing each block" explanation from previous posts doesn't work. Specifically, from an email from someone in the know:

No, what you're missing is that neither client nor server actually
performs writes completely synchronously by default.

The client returns from the "write" systemcall without waiting for the
writes to complete on the server.  It also does not wait for one WRITE
call to the server to complete before sending the next; instead it keeps
multiple WRITEs in flight.

An NFS version >=3 server, even in the "sync" case, also does not wait
for data to hit disk before replying to the WRITE systemcall, allowing
it to keep multiple IOs in flight to the disk simultaneously (and
hopefully also allowing it to coalesce contiguous writes to minimize

So, at least with the default "sync" option on export, it seems like the NFS server is supposed to do the right thing, namely batch stuff up in the FS disk cache, and wait for a request for COMMIT to flush to disk. But, this still doesn't explain the massive differences in throughput I'm seeing for sync vs async... so, it remains to be seen if I can find the explanation, and hopefully get the info into the right hands to "fix" this. A short packet capture with "sync" vs "async" also showed that there didn't seem to be any difference in terms of pauses between WRITE and the acknowledgement from the server. (for short files, I wasn't prepared to packet-capture a full gigabyte file transfer).

It seems like the main difference between sync and async on the server is whether a COMMIT request gets returned immediately, or actually waits for things to hit the disk. So, you really do want "sync" on your server because otherwise the server lies about the stability of your file. If this is the only difference, then I really can't explain what's up. Doing a proper test on the raw filesystem locally on the server shows it's capable of about 60-65MB/s due to needing to do btrfs RAID10 operations down a USB 3.0 connection. Testing an individual drive in the cabinet with hdparm gives 100MB/s. So perhaps the reduction from 60-65 down to 45 MB/s (70-75% of raw disk efficiency) is all you can expect given the NFS stuff in between and two different computers with different amounts of RAM and different processors and soforth.

I'd be interested to see if I can increase the speed of throughput to the BTRFS volumes down the USB 3 link. I suspect the little Orico hot-swap cabinet is the bottleneck there. Later, when a larger chassis becomes available I may put the server in a regular mid-tower and put all the disks directly on SATA, leaving the hot-swap for replacing broken disks or later expansion.

NFS network efficiency sync vs async

2016 October 8
by Daniel Lakeland

So, in my previous post I discussed sync vs async operation in an NFS server (not client), and there was a bit of math there that I glossed over. It's worth developing. Suppose that B_w is the network bandwidth, B_d is the bandwidth of continuous writes to disk, t_s = 60/RPM/2 is the average seek time (half rotation of the disk), T_c is the time between commits on the underlying filesystem, and S_b is the block size of the NFS network transfer. Then, with sync on the export, to write each block puts B_b bytes on the disk in a time equal to T_s = S_b/B_w+t_s. So that the bandwidth of transfer to disk as a fraction of the network bandwidth (the efficiency) is:

\frac{\frac{S_b}{\frac{S_b}{B_w} + \frac{30}{RPM}}}{B_w}

Which after some algebra can be expressed as:

\frac{1}{1 + \frac{30 B_w}{S_b RPM}}

So, as bandwidth of the network increaes, efficiency decreases, and this can only be compensated by increasing the block size of transfer. In the end to get high efficiency, it must take much longer than a rotation of the disk to transfer the block. That suggests something like a 500 MB block size for a gigabit ethernet! Hmm...

Alternatively, we can do async transfer. Then, the time to transfer the data D across the wire is D/B_w and the time to write it to disk is more or less 1 seek, the time to write the data to the disk in a large block, and then maybe another seek. Efficiency is then something like:

\frac{D}{\mathrm{max}(D, \frac{DB_{we}}{B_{wd}}) + \frac{30B_{we}}{RPM}}

Where now B_{we} is the ethernet bandwidth and B_{wd} is the disk bandwidth. Assuming disks are similar bandwidth to gigabit ethernet, we can simplify to

\frac{1}{1+\frac{30B_{we}}{D\times RPM}}

In other words, it's just like the first case, except instead of the 1 MB network block size, the efficiency depends on D the full size of the file. I suppose we can get a little more fancy but as a first approximation, clearly the scaling of the "async" option gets us MUCH closer to 100% efficiency with large file transfers. The big problem is that if someone wants to ensure that things are really written to disk, NFS seems to treat explicit fsync() as a no-op when async is on.

Regular local filesystems like ext4 already buffer writes and then flush them periodically in large batches. So having NFS insist on flushing 1MB writes is basically like mounting your ext4 filesystem with the "sync" option, something NO-ONE does.

But, consider back in the bad old days when NFS was designed. Look at the formula for efficiency under the "sync" condition. Put in B_w of 1 Mbps coaxial ethernet, and S_b of 8kB into the network bandwidth efficiency formula:

\frac{1}{1 + \frac{30 \times 1000/8}{8\times 5600}} = 0.92

In the bad old days, no one noticed an efficiency problem, because it took so long to transfer 8kB at 1 Mbps that an extra half-rotation of the platters wasn't a factor.


Modern NFS performance issues

2016 October 7
by Daniel Lakeland

So, there are lots of HOWTOs and things on tuning NFS available on the internet. However, they are almost uniformly about ancient versions of NFS, such as NFS v2 and NFS v3 from 2003 or whatever. In computer terms 10 year old performance tuning information is like telling people how to get the most effective cracking sounds out of your buggy whips.

So, here's a tiny bit of info about what helped me get good performance out of NFS on btrfs on modern Linux systems over a small Gigabit LAN network.

The Issues:

What I was finding was that NFS had terrible write performance and fine read performance. I'm not alone. It's a problem many people have discussed to very little conclusion:

For example:

At, or again at or at

Specifically, in writing on the order of 100MB or more, it would vary between about 12 MB/s and at best about 44MB/s over a gigabit ethernet cable about 15 feet long with one switch in the middle. iperf would push 900 Mb/s down the wire no problem, so it wasn't a network issue (around 100 MB/s, where MB = mega BYTEs and Mb = mega bits (8 bits per byte)). Reading the data would give around 100MB/s which is around what you expect from gigabit networks.

So, the networking hardware was able to get close to the theoretical gigabit level, but the NFS protocol writing data was about 5 times slower.

Furthermore, occasionally my system would just freeze for several seconds at a time. Especially that was happening when I was doing something like a scrub or rebalance on the btrfs filesystem that NFS was on top of, where I could get full system freezes lasting 10 or 20 seconds on the NFS client computer (my desktop).

The Setup:

NFS version 4.1 with krb5p security running over TCP (v4.x always uses TCP) running on a small server in my closet with multi-core celeron micro-server board, with quality Intel gigabit NICs, and a USB 3.0 multi-drive cabinet that had 4 drives doing RAID10 via btrfs. Serving my home directory to a Core i7 desktop machine. The hardware can push the data over the wire no problem. Bandwidth to the disks using "hdparm -t" was around 100MB/s for each disk, and the USB3.0 can do several hundred MB/s over the bus. In theory I should be able to write 70-100MB/s mirrored without problem. In fact, dd can write a gig to this array and call fsync at the end, at 70MB/s locally. Without the fsync, the OS will pretend by putting it all in RAM at 1.1 GB/s and then flush it to disk 5 seconds later.

So, what was wrong? I found reports of this issue but couldn't get a resolution. In the end, the thing that changed my system from between about 15 MB/s up to 45MB/s inconsistently to 100MB/s consistently was to put the "async" option in my exports and to mount the btrfs /home directory with the commit=5 option:

/exports        *(rw,no_subtree_check,async,sec=krb5i,fsid=0)

Why did this work? As I understand it, btrfs by default wants to wait 30 seconds before committing data to the disk. But, then when it does commit the data, it tends to take it several seconds. You can see this by doing a bunch of test i/o and then on the command line calling "sync" where it might wait 4 or 5 seconds to fully flush the disk on a RAID10 over USB 3.0. I believe that this was causing the very long freezes, and if I ssh into the server and issued a "sync" in fact it would freeze my desktop for several seconds until the sync completed.

What's going on with NFS? Well with the "sync" option on the export, NFS syncs every single block you write. The default NFSv4 mount options put a 1MB block size


It then negotiates down from there to possibly smaller sizes. But this means you're forcing btrfs to write every 1MB to the disk and ensure it's on disk as it comes down the wire. Now, a gigabit network transfer is going to take 8 milliseconds to send that 1MB block. In other words, you're going to ask it to write to the disk about 100 times per second. On average, the disk may need to rotate about half a rotation for each write to position the head. at 5600 RPM that's 10ms / write. So it takes 8 milliseconds to send a MB, then it takes 10ms to get it on disk. Only 8/18 of the time is spent actually transferring data on the wire. If full bandwidth is 100MB/s then 8/18*100 = 44 MB/s is exactly the best case situation I was seeing. Put in some extra hoo-ha of doing btrfs overhead and ensuring everything is written twice (mirrored in RAID10) and whatnot and you'll see disk write performance between say 12 and 45 MB/s just like I was.

Well, that sucks. But so does losing data. What happens when we put "async" in the mount options?

Now, instead of writing EACH block to the disk as it comes in, just like on a modern local file system, each write block is buffered up and flushed to disc only periodically. For example, locally if you write to your btrfs on spinning rust, you'll write a bunch of data, the kernel will put it in buffers, and then after the amount of time in the "commit" option for the btrfs mount, it will get flushed out to disk.

IF your application wants to check-point something and make sure it's really on disk, it can manually call "fsync" to flush the file to disk. It appears that this is unfortunately a no-op when async is on the export. But, it does stop the NFS server from syncing each 1MB block to disk. That's not great that fsync doesn't work right, but it may be better than waiting for each 1MB block to flush to disk. In any case consider how you would write a program to write a big bunch of data to the disk

Option 1 (in pseudocode):

for i in 1 to 100
  write block[i]

vs the alternative where you wait for each block

for i in 1 to 100
  write block[i]

Very few people would write the second code, because they figure first they'll write the data to the OS, and then they'll make sure it's on disk at the end. What good is having just part of the file on disk? Well mounting NFS with "sync" export turns the first code into the second code.

So, with the "sync" option, NFS is taking an approach "get every single block on disk before you acknowledge it" which is way more conservative than even a regular local filesystem. With the "async" option, it's saying "yes I got your data" as soon as it has it in a buffer, and then it flushes it out to disk according to the kernel's schedule, or hopefully according to when the application calls fflush, except it appears that fflush is a no-op. It would make sense to me to have an option to the kernel NFS server to make it reply to writes async but NOT to reply to fsync in an async manner.

So, it seems to me like you want "async" on every mount, and you want a UPS on your server (duh) and with a UPS and reasonably short "commit" intervals, you recover both better performance and better reliability (Better reliability because a 30 second freezup on your desktop machine is likely to cause a user to reach for the reset-switch, and also because the UPS...). Now, with btrfs committing more frequently, you also avoid the "someone called fsync and now my firefox window is hung for 23 seconds while btrfs flushes a couple megabytes of data via copy-on-write".

Remember, 5 seconds is an eternity for a gigabit ethernet. You can push around 600 MEGABYTES of data over the wire in 5 seconds. So the typical 30 second commit interval could in theory force btrfs to write 3.75 GIGABYTES of data before it can return from an fsync call. Keep the NFS server in "async" mode, and the commit interval down to a few seconds if you're serving NFS off btrfs on Linux on Gigabit ethernet.

Oh, and Kernel people, are you listening? make "async" on the export give async writes of individual blocks, but still sync on "fsync".


On btrfs linux desktops and NFSv4 home directories

2016 October 2
by Daniel Lakeland

In my house I run a small file server in my closet. It lets me store all my family photo albums, music, videos, and daily work in a way that is accessible from any computer in the house. It's useful for making sure my wife can see our photo albums and soforth, as well as giving me an extra measure of safety against running up to date software on my desktop machine (kernels, video drivers, Gnome desktops etc). If my desktop machine crashes hard and requires a hardware reset, at least my files are on a different machine. It also makes backups pretty easy, just slot an HD into a USB docking station and run a simple rdiff-backup script.

Well, this little box in the closet had a single hard drive that was about 4 years old, and its SMART stats were not terrible, but not stellar. My inclination is to replace drives every 4 years or so to avoid data loss and downtime, so this time I built out a 4 disk array using an Orico USB 3.0 five tray hot swap enclosure (still waiting for the USB 3.0 low-profile card so it's running on a USB 2.0 port for the moment). In order to deal with the RAID in a modern way, I chose to build a btrfs filesystem on the array.

Here are some things you really should do if you want to build a btrfs RAID system and export it via NFS.

  1. Use SATA or USB 3.0, since USB 2.0 is too slow. This may seem obvious, but if you don't have a USB 3.0 port, you might be tempted. Just get a USB 3.0 card.
  2. Build the btrfs with the raid level you want. I started with 3 drives that I'd had sitting around that had previously been built into a btrfs RAID 1 array, which is pretty silly since RAID 1 is mirroring so you really need an even number of drives. Anyway, I added a fourth drive and did "btrfs balance start -mconvert=raid10 -dconvert=raid10 /home" to convert it to RAID 10 and over USB 2.0 that took about 24 hours for around 650 GB of data. In the process it makes using the filesystem over NFS painfully slow. BUT it DID work and it didn't lose any data. It even did the right thing initially when I hot-swapped in the new drive and all the drives went offline (something I didn't expect but probably related to the USB controller in the little Orico cabinet). This caused btrfs to bork but it did so safely and a reboot and scrub didn't find errors.
  3. If you're going to hot-swap a drive in a USB enclosure, do so by unmounting the filesystem and slotting it in, and then remounting the filesystem, just in case.
  4. Have good backups. Since my data is still small enough to fit on a single drive, I have two drives that rotate through backups. One sits in a bank safe deposit box and one sits at home. They're formatted to ext4 which means my backups aren't subject to the same potential filesystem bugs.
  5. Use an up to date kernel and up to date btrfs-progs. Check the wiki, find out which versions are recommended. I'm running kernel 4.6.0 and btrfs-progs 4.7.3 and I did some pretty intense btrfs operations over this period without errors.
  6. If you are exporting your home directory via NFS, do NOT run Gnome etc with the directories "~/.local/share" or "~/.config" or "~/.cache" on the NFS. instead, make those links to local storage. Yes, each machine will have a different config, but trust me, dconf doesn't work right, and cacheing on NFS is a bad idea, the NFS configuration will cause your machine to hang for 3 to 5 seconds at a time if not crash outright. Use the local storage approach and back-up the local storage to a tar file on the NFS home via a cron job.
  7. Run btrfs scrub on your array monthly as recommended by the btrfs wiki, but don't expect to be able to do lots of stuff while it's running. So get that USB 3.0 card, and schedule the scrub via cron so it runs while you sleep.
  8. Three words: Backups, backups, and off-site backups

My experience suggests that fully up-to-date btrfs versions such as kernel 4.6.0 and btrfs-progs 4.7.3 are usable in RAID 0,1, or 10 (raid 5/6 is not recommended on the btrfs wiki) on a file server with a UPS. I'd be wary of running on a desktop machine where you might occasionally have a kernel lockup due to a video driver issue or because you tried to load a massive dataset for analysis into RAM and it slowed to a 3 month before it responds to Ctrl-C type crawl or where the power might go out on you or someone might trip on the USB cord or whatever.


How bufferbloat kills your VOIP and other low-latency communications

2016 September 29
by Daniel Lakeland

So, if you use a VOIP phone, or Skype, or play real-time internet games, or do lots of video chats you may notice that sometimes your connection becomes "SLOOOW" in particular that there is a large lag between when you do something and when that thing registers on the internet. Symptoms of this include both sides of a conversation talking over each other for example.

One way to deal with this is QOS traffic shaping. That is, to define different queues for different packets. I've covered that before on this blog. But one thing that is typically recommended is that you QOS traffic shape only on the up-link, that is packets you're sending to the internet. Packets you receive from the internet, typically you'd just send them to their destination machine on your network as fast as possible.

Well, it turns out that this is bad advice. Yes, you should traffic shape your upstream packets, but, there is good reason to actually traffic shape your download packets as well. The reason is, Bufferbloat.

Somewhere on the other end of your cable service (or DSL or WISP or whatever) is a router that is sending your router packets. That router may have a large buffer in it. If something like a streaming video player on your network requests a big block of video data, the connection between Amazon or whatever and the router on the other end of your last-mile cable could be a lot faster than the cable connection you have. When you request all that video data, the remote router will get an initial flood of data and slam it all into a buffer to be sent to you. Once that buffer fills up, it will stop ACKing packets and Amazon will eventually slow down. The only problem? If you are expecting some voice packets to come in, they will have to wait in line at the remote router until all that video gets sent to you, and if that buffer is big, they will have to wait in line for maybe 1 to 3 SECONDS. That means several seconds of DEAD AIR on your VOIP phone (or several seconds of NO RESPONSE in your game).

The solution is to not let that remote router buffer up stuff. To do that, you have to ensure that your router drops packets on the floor quickly so that Amazon throttles its bandwidth back so that the remote buffer doesn't fill up. The main way you have to achieve this is to use QOS and throttle your download speed slightly below what your ISP offers you. Here's how:

Go to the dslreports speed test and run the test with your QOS turned off, and no-one using your internet connection. Run it two or three times and see how fast your uplink and downlink speeds are. Pick the smallest value you see out of 3 tries for upload and download and multiply by 0.95. These two numbers will be your uplink and downlink speed you will put into your QOS script.

Did the dslreports speed test show buffer bloat of several hundred to a thousand milliseconds? Mine did on download, and it was destroying my VOIP performance. I have about a 60 Mbps download connection, but if I had someone pulling video I could have 1-3 second cut-outs in my audio thanks to buffer bloat. Putting 58000 kbit/s in my download speed of my qos script dropped buffer bloat measurements from 1000+ ms to 40ms. Now, when my router detects that it's receiving packets faster than 58000 kbps it stops sending ACKs and the source will automatically slow down its send speed, this prevents a buffer from blowing up in size at my ISP's router, and now I can control my QOS with low-latency!

You're welcome!

Thinking about gamma priors

2016 September 21
by Daniel Lakeland

Suppose you've got a parameter that's a positive number whose order of magnitude you know. For example, my height, you don't know what it is, but if I asked you if it was negative you'd say with 100% logical certainty that it wasn't, and if I asked you if it was 3 inches you'd be damn sure it wasn't, and if I asked you if it was 5 ft you might think that's reasonable, and 20 feet is certainly unreasonable... A typical average height for an adult male is something like 5ft 10 inches (178 cm). So it would be safe to say something like

h ~ exponential(1.0/178)

for an adult male height in cm. But, the peak density of this distribution is at h=0 and it extends well out into the 3 to 4 times the average height. That seems problematic.

Here's where the gamma distribution comes in. The average of a gamma(k,1/x) random variable is kx. And in fact, if a and b are both  exponential(1/x) then the sum is gamma(2,1/x) and for n exponential variables, gamma(n,1/x) for integer values of n. But n doesn't need to be an integer. The shape parameter n is continuous and is a parameter that "acts like" a sample size. That is, as n increases the gamma distribution holding the average value constant, so gamma(n,n/x) becomes more delta function like around the average value x.

If all you know is a variable is positive and has a typical value, you can use exponential(1/x) as a maximum entropy distribution for a given average x. If you know that the distribution is more concentrated near the average (and away from both zero and infinity) then you can use gamma(n,n/x) as a continuously "less than maximum entropy" distribution where "n" is an "effective sample size" and so can help you understand how to choose n. So if you think you have about as much information about my height as if you'd found say 12 randomly selected adult males from the US, you can use gamma(12,12.0/178) as a prior for my height.

What does that distribution look like? It has 95% interval 92 to 292 cm (3 to 9.6 ft) which is probably pretty reasonable for a height considering that I could for all you know be a dwarf or a basketball player...

The gamma distribution is in fact a maximum entropy distribution. Specifically, based on the explanation on wikipedia:

The gamma distribution is the maximum entropy probability distribution for a random variable X for which E[X] is fixed and greater than zero, and E[ln(X)] is fixed. [edited for clarity]

That is, you know what the average is, and you know what the average logarithm is (think of the average logarithm as a limit on the order of magnitude).


The WiFi experience a week out

2016 September 12
by Daniel Lakeland

So, I've had my 3 station home WiFi network up and running for about a week, and the question is, does it make any real difference? Specifically does it make a difference with respect to 2.4 GHz which is the more congested band.

First off, you should understand the situation at my house. There are 4 Amazon Fire tablets, a Fire TV stick, two cell phones, my wife uses a Mac laptop, if friends or family are visiting add at least a tablet and a cell phone, and our home phone system runs two incoming lines via VOIP. Worst case with my mother visiting the kids, we could have 4 Netflix streams and two phone calls all going on simultaneously. The cable internet connection is 60 Mbps downstream and not quite 4Mbps upstream, verified via several online speed tests from my wired desktop machine.

The first thing to say is that in fact the three stations are all being used. My kids sitting in the front room playing Minecraft on a tablet computer or watching a video will wind up on the station in the front room, whereas if I'm in my office or bedroom I'll wind up on a different station. This does in fact seem to help when talking on my phone over a VOIP connection. People haven't been complaining about stuttering and that makes sense because there won't be contention for the wireless connection, and if the packets can get to my router in time, my QoS script at the router will prioritize the VOIP streams.

Another thing I was concerned about was what happens when I walk around the house while on the phone? Hand-offs between the different stations, if they're occurring, do not seem to be causing any problems.

How about coverage? Inside our house was always pretty good, but a phone conversation in the back yard was out of the question. The new setup seems to let me talk from anywhere on my property, and because the xmit strength is turned down and I chose my channels to not interfere, by the time I get across the street, or into the neighbor yards I'm not bothering the neighbors.

My wife needed to pull some 10 GB of video data off our server (videos she took for her lab work) and throughput to her Mac laptop was sufficiently high that she didn't notice the file copy (ie. by the time she'd checked Facebook or Reddit or whatever it was done).

So, overall, it certainly didn't hurt, and it has helped extend coverage, reduce latency/contention from up to maybe 10 devices in our house being used simultaneously, without interference to our neighbors.


Multi-Station Home WiFi with WDS 5GHz backbone

2016 September 7
by Daniel Lakeland

You might be a computer nerd if you run OpenWRT on your home wifi router. If you do, you might want to take advantage of the following setup.

My main router used to be the only access point I had, it ran OpenWRT on a Buffalo WZR-600DHP which is a dual band 802.11n router. Pretty good kit for not much money. It has QoS settings, IPv6, special firewall settings, an OpenVPN set-up and lots of good stuff going.

Well, I run CSipSimple on my cell phone and take calls via VOIP over WiFi and the coverage in my house (old school concrete based plaster walls) was just OK. People would complain if my kids streamed videos while I was on the phone etc. Once the packets hit the router it's possible to QoS them upstream to the ISP, but it's harder to share just one radio channel.

A while back I grabbed two TP-Link WDR3600 routers which are somewhat less fancy but dual band and work well with OpenWRT. My plan was to pull Cat5e under my house to the front room, and the back room and wire them in so I could use channels 1, 6, and 11 to get good coverage. Of course, I'd be nice to neighbors, so I'd turn DOWN the power output to just the required level.

Well, running the wires never really happened, and I was thinking to myself "Gee I have these two routers, can I use them without the wires?"

The answer, of course, is YES, specifically, you can use the 5GHz link in WDS mode to distribute the local network to each of the routers, and this will still let you have a 2.4GHz access point in multiple locations. Since 5GHz can be set up with 40MHz channel width effectively and is generally lower interference, it's actually perfect for distributing wireless WITHIN your house, while 2.4 GHz is great for getting coverage on your back deck, front yard, various bedrooms, and throughout your house for devices that only have 2.4GHz (which is still a lot of devices).

Here's how the system works:

  1. On the "central" router, set up the 5GHz link in Access Point (WDS) mode (you might make this an "additional" network, with it's own "backbone" ESSID if you like, but then if you do, the clients need to use that ESSID, and you need to bridge it on the central router)
  2. On each satellite router delete the WAN network, and set up the 5GHz wifi in Client (WDS) mode with same ESSID as in (1).
  3. Add an additional wifi network to the 5GHz link in "Access Point" mode (if you use just one ESSID set that, otherwise set up the "normal" client ESSID)
  4. Set up the satellite 2.4GHz network in Access Point mode (using the "normal" client ESSID)
  5. Make sure all the WiFi networks on the satellite device have the same WPA2 settings and pre-shared-key.
  6. Set up the satellite station's LAN network to bridge the 5Ghz Client, 5GHz AP, 2.4GHz AP, and wired ethernet into one LAN.
  7. Set up the satellite station LAN network to be in DHCP client mode (you'll have to do an "are you sure" type acknowledgement of the change).
  8. On the central router add static DHCP assignments for your satellite devices so you will be able to access them for maintenance.
  9. Reboot the satellite devices, they should come up with a 5GHz client link to the central router which gives them LAN connectivity, and both 5GHz and 2.4GHz access points for local stations to use.
  10. Lay out your channel usage on 2.4GHz so you're interfering as little as possible with neighbors, and turn down the xmit power on the stations. you have more stations, each one should cover its own smaller area at low power, they don't all need to SHOUT.


On Incorporating Assertions in Bayesian Models

2016 August 23
by Daniel Lakeland

I've been thinking a lot recently about how to specify a prior over functions. One way is to specify a Gaussian Process, but this can be quite computationally heavy, and when the function in question is fairly smooth and simple, it's also kind of overkill.


The typical method for specifying a function is via a basis expansion. For example in the R function lm you can specify a third order polynomial model for y as y \sim x + x^2 + x^3. This means y = a + bx + cx^2 + dx^3 + \epsilon and uses linear least squares to fit.

Well, when fitting this sort of thing in a full Bayesian model, you should provide priors over a,b,c,d. Typically we can specify some very simple thing about each, such as maybe

a ~ normal(0,100) if you know that the size of the coefficients is typical a small multiple of 100.

But usually we have some information not about the individual coefficients themselves, but about some property of the function. For example, we might know that the function is "near" to some constant value, doesn't change too rapidly, and doesn't wiggle too rapidly:

\int_0^1 f(x) dx \sim \mu

\sqrt{\int_0^1 (\frac{d}{dx} f(x))^2 dx} \sim S

\sqrt{\int_0^1 (\frac{d^2}{dx^2}f(x))^2 dx}\sim Q

From the perspective of MCMC, we can calculate numerical quantities either using symbolic methods or by numerical integration, and then place probabilities on these outcomes. Note, these quantities are in general nonlinear functionals of the f(x) function. The parameter space in general will be high dimensional (let's say for example 15 coefficients in a polynomial fit) and the number of conditions for which we have prior information are less than 15, so there is no one-to-one mapping from coefficients to \mu,S,Q. As such, the prior you're going to place here is in part determined by the "multiplicity" of the outcome. Some \mu values can occur in more ways than other \mu values... This is just a fact of life about a mismatch between our state of information and the dimensionality of the parameters.

As an example, here's a complete R script using rstan and ggplot2 to put a distribution on a 3rd order polynomial on the region x \in [0,1]. To calculate the mean value, RMS slope, and RMS curvature I use a super-simple 3 point midpoint integration rule. I specify that the mean value is normally distributed around 100, the RMS slope is of typical size 10, and the RMS second derivative is on order 1000. Run the script and you'll get 20 random curves plotted, you'll see that they are "flattish" with some mild wiggles on the scale of a function whose values are around 100.

Add some data generated by your favorite function with some random measurement error, and add a normal likelihood, and see what happens. Add some additional parameters aa,bb,cc,dd which have uniform(-1e6,1e6) priors and the same normal likelihood (essentially the least-squares solution). Compare the posterior fits especially with only a few data points when you provide smoothness information compared to the unconstrained least squares solution.

Does the very low quality numerical integration routine matter? What happens if you add some additional higher order terms? If you have N data points and N coefficients, you can fit the curve directly through the points. That will inevitably be the least squares solution. Does that happen when you provide this kind of smoothness prior?

Note: it's problematic to do basis expansion in the 1,x,x^2,x^3... basis, usually you'd work with an orthogonal polynomial, and deal with numerical evaluation of the polynomial using special techniques for stability and to prevent cancellation problems etc. But this is just an example, and the technique should work well even for other types of basis expansions, such as radial basis functions which are often a good idea for strangely shaped domains (2D or 3D functions for example).

Above are pairs of scatterplots for samples of a,b,c,d, showing how the effective prior looks.


stanmodl <- "

real f(real x,real a, real b, real c, real d){
return a + b*x + c*x^2 + d*x^3;

real fp(real x, real a, real b, real c, real d){
return(b + 2*c*x+3*d*x^2);

real fp2(real x, real a, real b, real c, real d){
return 2*c+6*d*x;

real favg(real a, real b, real c, real d){
return((f(1.0/6,a,b,c,d) + f(0.5,a,b,c,d) + f(1-1.0/6,a,b,c,d))/3);

real slfun(real a, real b, real c, real d){
return(sqrt((fp(1.0/6,a,b,c,d)^2 + fp(0.5,a,b,c,d)^2 + fp(1-1.0/6,a,b,c,d)^2)/3));

real qfun(real a, real b, real c, real d){
return sqrt((fp2(1.0/6,a,b,c,d)^2 + fp2(.5,a,b,c,d)^2 + fp2(1-1.0/6,a,b,c,d)^2)/3.0);


real a;
real b;
real c;
real d;

transformed parameters{

real fbar;
real q;
real slavg;

 fbar = favg(a,b,c,d); 
 q = qfun(a,b,c,d);
 slavg = slfun(a,b,c,d);
fbar ~ normal(100.0,10);
slavg ~ exponential(1/10.0);
q ~ exponential(1/1000.0);


samps <- stan(model_code=stanmodl);
vals <- extract(samps);

plt <- ggplot(data=data.frame(x=0),aes(x=x))+xlim(0,1)
funs <- list();
for(i in sample(1:2000,20)){
    a <- vals$a[i];
    b <- vals$b[i];
    c <- vals$c[i];
    d <- vals$d[i];
    funs <- c(funs,    (function(a,b,c,d){a <- a; b <- b; c <- c; d <- d;  function(s){return(a+b*s+c*s^2+d*s^3)}})(a,b,c,d))

for(i in 1:length(funs)){
    plt <- plt+stat_function(fun=funs[[i]])