NFS network efficiency sync vs async

2016 October 8
by Daniel Lakeland

So, in my previous post I discussed sync vs async operation in an NFS server (not client), and there was a bit of math there that I glossed over. It's worth developing. Suppose that B_w is the network bandwidth, B_d is the bandwidth of continuous writes to disk, t_s = 60/RPM/2 is the average seek time (half rotation of the disk), T_c is the time between commits on the underlying filesystem, and S_b is the block size of the NFS network transfer. Then, with sync on the export, to write each block puts B_b bytes on the disk in a time equal to T_s = S_b/B_w+t_s. So that the bandwidth of transfer to disk as a fraction of the network bandwidth (the efficiency) is:

\frac{\frac{S_b}{\frac{S_b}{B_w} + \frac{30}{RPM}}}{B_w}

Which after some algebra can be expressed as:

\frac{1}{1 + \frac{30 B_w}{S_b RPM}}

So, as bandwidth of the network increaes, efficiency decreases, and this can only be compensated by increasing the block size of transfer. In the end to get high efficiency, it must take much longer than a rotation of the disk to transfer the block. That suggests something like a 500 MB block size for a gigabit ethernet! Hmm...

Alternatively, we can do async transfer. Then, the time to transfer the data D across the wire is D/B_w and the time to write it to disk is more or less 1 seek, the time to write the data to the disk in a large block, and then maybe another seek. Efficiency is then something like:

\frac{D}{\mathrm{max}(D, \frac{DB_{we}}{B_{wd}}) + \frac{30B_{we}}{RPM}}

Where now B_{we} is the ethernet bandwidth and B_{wd} is the disk bandwidth. Assuming disks are similar bandwidth to gigabit ethernet, we can simplify to

\frac{1}{1+\frac{30B_{we}}{D\times RPM}}

In other words, it's just like the first case, except instead of the 1 MB network block size, the efficiency depends on D the full size of the file. I suppose we can get a little more fancy but as a first approximation, clearly the scaling of the "async" option gets us MUCH closer to 100% efficiency with large file transfers. The big problem is that if someone wants to ensure that things are really written to disk, NFS seems to treat explicit fsync() as a no-op when async is on.

Regular local filesystems like ext4 already buffer writes and then flush them periodically in large batches. So having NFS insist on flushing 1MB writes is basically like mounting your ext4 filesystem with the "sync" option, something NO-ONE does.

But, consider back in the bad old days when NFS was designed. Look at the formula for efficiency under the "sync" condition. Put in B_w of 1 Mbps coaxial ethernet, and S_b of 8kB into the network bandwidth efficiency formula:

\frac{1}{1 + \frac{30 \times 1000/8}{8\times 5600}} = 0.92

In the bad old days, no one noticed an efficiency problem, because it took so long to transfer 8kB at 1 Mbps that an extra half-rotation of the platters wasn't a factor.


No comments yet

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS