I'm observing all sorts of race conditions occurring in various VNOPs my custom filesystem implements.
I'm inclined to attribute this to my implementation not following the locking rules expected by the system of a 3rd party filesystem as well as it should.
I've looked at how locking is done in Apple's own implementation of Samba and NFS clients.
The Samba client uses read/write locks to protect its node from data races. While the NFS client uses mutex locks for the same purpose.
I realised that I don't have a clear model in my head of how locking should be done properly.
Thus my question, what are the locking rules for VFS and VNOP operations?
Thanks.
First off, a bit of a warning and a disclaimer. The vfs system has never been documented to any significant degree. It's supported API in the sense that it is technically possible to create a function VFS KEXT. However, VFS development is not something we've ever encouraged or actively supported. If you choose to use these APIs, you need to understand that state of what you're working with and the limits to the help you'll find.
So I assumed that macOS must have its own locking rules for VFS as well.
It would be helpful to understand what they are, so my filesystem can adhere to them properly.
The best statement of the "rules" I've found are in the EmptyFS ReadMe. It's to long to summarize, the quick introduction is:
VFS handles the locking and reference counting needed for the structures it manages (mount points, vnodes, and so on). Your VFS plug-in can decide how to lock the structures it manages; VFS does not impose locking requirements on your plug-in. You can use mutexes or read/write locks explicitly, so you can get the maximum concurrency, or let VFS do most of your locking for you, which makes your code easier.
See the ReadMe for the more complete description.
On macOS, this doesn't seem sufficient. Thus my query.
What's actually happening? Locking issues in the VFS layer tend to result in hangs or panics, but it sounds like you're dealing with data corruption or presentation issues. That's generally caused by problems with your own locking, not the VFS layer.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware