Bus errors, core dumps, and binaries on NFS (2018)

dveeden2 · on March 7, 2020

Reminds me of the differences between `cp` and `install` for putting .so files in place.

`install` basically results in a new inode allowing for processes to keep a handle on the old version of the .so

`cp` copies the content over the old content, keeping the same inode. Now applications that had that .so open won't be happy.

temac · on March 7, 2020

Binaries are probably not mapped with MAP_SHARED, but I checked the mmap manual for MAP_PRIVATE and read: "It is unspecified whether changes made to the file after the mmap() call are visible in the mapped region."

I learn something everyday...

burntsushi · on March 7, 2020

This can also occur when memory mapping a file on Linux, which is similarish to the NFS issue. If you truncate a file that you've already memory mapped and the OS tries to read past the truncated part, you get SIGBUS.

iforgotpassword · on March 7, 2020

Wanted to comment the same. When I first learned about memory mapping I went "mmap() all the things!!" since it's so much easier than reads and writes all the time, checking for short reads, aligning the pointer and calling read again, handling EINTR, you name it.

But at least you do get proper error codes that you can handle in a somewhat sane way.

A read our write error for an mmapped file? SIGBUS, game over. Want to handle it? Use a signal handler for SIGBUS, use setjmp before every access to your mmapped region and longmp back from your signal handler. And you thought handling all the failure modes of read/write was ugly.

Use mmap if you absolutely need the performance. Otherwise just don't.

temac · on March 7, 2020

And even then mmap does not guarantee performance. So for example just don't use mmap for small files, and for larger ones benchmark to see if you really get some perf.

As for the code, if you just can let it crash, prefer that to attempting to handle it, because this kind of handling is even worse to test and debug than normal error code paths.

icedchai · on March 7, 2020

Back almost 20 years ago, I worked on a medium-sized system - 1000's of simultaneous users, millions in $USD transactions daily - that was based on an mmap'ed flat file "database." It worked amazingly well. (Note that we did none of that sort of error handling!)

todd8 · on March 7, 2020

Yes, the first time I saw this described was in 1987 in a paper by A. Birrell, et. al. See [1]. It was also available as a DEC SRC report, number 24.

[1] A simple and efficient implementation of a small database, https://dl.acm.org/doi/abs/10.1145/37499.37517?download=true

temac · on March 7, 2020

Or if your FS is slightly corrupted and figures out quite late it does not know where part of the file is. Or if you suddenly loose access to the hard drive.

If you write your program carefully, you can handle it, but it is way more difficult than handling error codes returned by a read function. You will even have to fight the compiler, because it does not distinguish between RW on manually mmapped data files from other RW; however you may want to just let it crash, because after all your program is also mmapped, including .text and .rodata sections, so it better works correctly -- anyway if the data files are on another partition and/or the application requires to be resilient, this excuse does not hold.

the8472 · on March 7, 2020

The lack of posix semantics when unlinking on NFS rears its ugly head in many more places. For example the common atomic write pattern that allows readers to keep reading a stale copy doesn't work anymore (you get ESTALE on IO or SIGBUS if it's mmaped) which means anything involving a frequently replaced file will require more workarounds than on any other filesystem.

jabl · on March 7, 2020

Isn't that what "silly rename" (on nfsv3, v4 doesn't need it?) is supposed to fix?

The problem the article mentions is overwritin a binary instead of renaming.

the8472 · on March 7, 2020

The article is about the atomic write pattern: create tempfile, move tempfile over original which effectively is an unlink of the original.

And yes, this should be solved, but some NFS servers don't support it, e.g. AWS's EFS.

qiqitori · on March 7, 2020

In my experience, you're better off avoiding NFS as much as possible. (Perhaps except when you're sharing a filesystem between VMs on the same machine.) Try something else, perhaps rsync, unless you know what you're doing. NFS over a VPN -- probably in for a rough ride.

In NFS, you can set mounts as 'hard' or 'soft'. If hard, errors will get you stuck until the share is back. You probably don't want that. If soft, you're slightly better off, but remember that the retry settings are all per-mount, and perhaps one size doesn't fit all.

As far as I know, when NFS goes awry, you get the same or similar behavior to a hypothetical HDD/SDD that just explicably decided to no longer do anything for a while. Your processes will be in a D state and won't be killable for a potentially long time.

macintux · on March 7, 2020

When I worked in BBN R&D back in the day, we used lots of NFS on a very large fragile LAN built from 10-base-2, plus some sketchy AppleTalk hardware in a closet somewhere nearby.

Every now and then I’d know someone was in the closet because my transceiver’s light would peg and NFS was locked up. Someone had once again bumped the AppleTalk router.

guenthert · on March 7, 2020

Soft mounts can lead to silent data corruption -- chances are you don't want that.