NFS network efficiency sync vs async
So, in my previous post I discussed sync vs async operation in an NFS server (not client), and there was a bit of math there that I glossed over. It's worth developing. Suppose that is the network bandwidth, is the bandwidth of continuous writes to disk, is the average seek time (half rotation of the disk), is the time between commits on the underlying filesystem, and is the block size of the NFS network transfer. Then, with sync on the export, to write each block puts bytes on the disk in a time equal to . So that the bandwidth of transfer to disk as a fraction of the network bandwidth (the efficiency) is:
Which after some algebra can be expressed as:
So, as bandwidth of the network increaes, efficiency decreases, and this can only be compensated by increasing the block size of transfer. In the end to get high efficiency, it must take much longer than a rotation of the disk to transfer the block. That suggests something like a 500 MB block size for a gigabit ethernet! Hmm...
Alternatively, we can do async transfer. Then, the time to transfer the data D across the wire is and the time to write it to disk is more or less 1 seek, the time to write the data to the disk in a large block, and then maybe another seek. Efficiency is then something like:
Where now is the ethernet bandwidth and is the disk bandwidth. Assuming disks are similar bandwidth to gigabit ethernet, we can simplify to
In other words, it's just like the first case, except instead of the 1 MB network block size, the efficiency depends on the full size of the file. I suppose we can get a little more fancy but as a first approximation, clearly the scaling of the "async" option gets us MUCH closer to 100% efficiency with large file transfers. The big problem is that if someone wants to ensure that things are really written to disk, NFS seems to treat explicit fsync() as a no-op when async is on.
Regular local filesystems like ext4 already buffer writes and then flush them periodically in large batches. So having NFS insist on flushing 1MB writes is basically like mounting your ext4 filesystem with the "sync" option, something NO-ONE does.
But, consider back in the bad old days when NFS was designed. Look at the formula for efficiency under the "sync" condition. Put in B_w of 1 Mbps coaxial ethernet, and S_b of 8kB into the network bandwidth efficiency formula:
In the bad old days, no one noticed an efficiency problem, because it took so long to transfer 8kB at 1 Mbps that an extra half-rotation of the platters wasn't a factor.