NFS on VFS/ZFS with open(..., O_EXCL) ?

Having a peculiar issue trying to support the use of O_EXCL. (Fail if O_CREAT and file exists). It will fail the first time, then if the call is repeated, it works as expected.

It is not entirely clear how macOS should handle O_EXCL, it has been mentioned that vnop_create() should always return EEXIST - does that mean even in the success case, it should return EEXIST instead of 0? That seems odd.

Output of test program is:

# (1) Create the file with (O_WRONLY|O_CREAT).
open okay
write okay
close okay
86 -rw-r-----  1 501  0  29 Jan 12 17:08 /Volumes/BOOM/teest.out

Deleting /Volumes/BOOM/teest.out

# (2) Try creating with (O_WRONLY|O_CREAT|O_EXCL).
writef: Stale NFS file handle
436207628 87 ---------- 1 501 wheel 0 0 "Jul  9 07:53:53 2037" "Jan 12 17:09:02 2022" "Jan 12 17:09:02 2022" "Jan  1 09:00:00 1970" 1048576 0 0 /Volumes/BOOM/teest.out

So, since the file is deleted in between the tests, O_EXCL shouldn't really kick in here, and yet, something goes wrong.

The nfs server sends ESTALE to the nfs client. The dtrace stack is:

Stack:
              kernel.development`nfsrv_setattr+0x7c6
              kernel.development`nfssvc_nfsd+0xbdc
              kernel.development`nfssvc+0x106
              kernel.development`unix_syscall64+0x2ba
              kernel.development`hndl_unix_scall64+0x16

Result:
  0 259014              nfsrv_setattr: entry 
  0 259014       mac_vnode_check_open:entry 
  0 259015     hook_vnode_check_open:return 2 nfsd
  0 259015      mac_vnode_check_open:return 2 nfsd

  0 229396              nfsrv_rephead:entry 
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 46 00 00 00                                      F...

So, nfssrv_setattr() replies with 0x46/70 (ESTALE) seemingly because the call hook_vnode_check_open() returns 2 (ENOENT).

Why though, the file was removed, I verified the cache has no entry. Then created again, confirmed it IS in the cache.

<zfs`zfs_vnop_remove (zfs_vnops_osx.c:1700)> zfs_vnop_remove error 0: checking cache: NOTFOUND
<zfs`zfs_vnop_create (zfs_vnops_osx.c:1427)> *** zfs_vnop_create: with 1: EXCL
<zfs`zfs_create (zfs_vnops_os.c:660)> zfs_create: zp is here 0x0
<zfs`zfs_vnop_create (zfs_vnops_osx.c:1458)> ** zfs_vnop_create created id 82
<zfs`zfs_vnop_create (zfs_vnops_osx.c:1475)> zfs_vnop_create error -1: checking cache: FOUND

I am having issues finding where the code for hook_vnode_check_open comes from anyway?

The failure call in nfs server is:

        if (!error && mac_vnode_check_open(ctx, vp, FREAD | FWRITE)) {
                error = ESTALE;
        }

So uh, why? If I let the test run again, this time the file exists, it returns EEXIST as expected.

If I run the first test twice, ie, without O_EXCL, both work. So it seems to only go wrong with O_EXCL, and file doesn't exist.

It is curious as to why nfs server figures out that exclusive is set, then clears va_mode?

                case NFS_CREATE_EXCLUSIVE:
                        exclusive_flag = 1;
                        if (vp == NULL) {
                                VATTR_SET(vap, va_mode, 0);

But doesn't use exclusive_flag until after calling VNOP_CREATE(), and it doesn't pass it either.

wouldn't let me post due to "content".

It is curious as to why nfs server figures out that exclusive is set, then clears va_mode?

Oh yeah, this came screaming back to me. So when NFS has to do exclusive, it attempts to create the file with metadata cleared, that is va_mode 0, and time. It uses atime to hold a create_verf (probably IP and counter), then uses getattr() to confirm the create_verf is matching (this client won). (va_flags has VA_UTIMES_NULL).

If that goes well, it calls setattr() with the correct information.

# We are about to create the file, with O_EXCL, no existing file.
  0 nfsrv_create:entry 
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
# Which finally calls our vnop_create
  0 zfs_vnop_create:entry 
# we have created it!
  0 zfs_vnop_create:return 0 nfsd
# nfsrv_create now calls setattr - probably to set atime
  0 zfs_vnop_setattr:entry 
  0 zfs_vnop_setattr:return 0 nfsd
# done, and some getattr verify
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
# send reply code to client
  0 nfsrv_rephead:entry 
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 00 00 00 00                                      ....

# nfsrv_create returns success! We have created the file
  0 nfsrv_create:return 0 nfsd

# this setattr appears to be the next client request, unsure where it is from
# but probably to set mode/uid/gid
  0 nfsrv_setattr:entry 
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
# nfsrv_setattr calls mac_vnode_check_open
  0 mac_vnode_check_open:entry 
  0 zfs_vnop_getattr:entry 
  0 zfs_vnop_getattr:return 0 nfsd
  0 zfs_vnop_getattr:entry 
  1 zfs_vnop_getattr:return 0 nfsd
# this setattr is about to fail.
  1 hook_vnode_check_open:return 2 nfsd
  1 hook_vnode_check_open:return 0 nfsd
  1 vng_vnode_check_open:return 0 nfsd
  1 mac_vnode_check_open:return 2 nfsd
  1 mac_vnode_check_open:return 2 nfsd
# mac_vnode_check_open failed, error = ESTALE
  1 zfs_vnop_getattr:entry 
  1 zfs_vnop_getattr:return 0 nfsd
  1 nfsrv_rephead:entry 
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 46 00 00 00                                      F...

I guess if I knew where hook_vnode_check_open() is defined, I can perhaps figure out what goes wrong, but I get a bit lost in the MACF macros.

Running kernel.debug and dtrace - it looks like:

  1 1              nfsrv_setattr:entry 
  1 1       mac_vnode_check_open:entry 
  1 1    mac_cred_check_enforce:return 1 nfsd
  1 1             mac_label_get:return 0 nfsd
  1 1          mac_error_select:return 2 nfsd
  1 1          mac_error_select:return 2 nfsd
  1 1          mac_error_select:return 2 nfsd
  1 1 mac_policy_list_conditional_busy:return 0 nfsd
  1 1 kernel.debug`nfssvc_nfsd+0x108b
              kernel.debug`nfssvc+0x360
              kernel.debug`unix_syscall64+0x81f
              kernel.debug`hndl_unix_scall64+0x16

  1 1      mac_vnode_check_open:return 2 nfsd
  1 1              nfsrv_rephead:entry 
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 46 00 00 00                                      F...

Which makes me thing that after the call to mac_label_get() it checks something and is not happy. I have not really looked into what vnode_label is but presumably has something to do with auth.

I have not really looked into what vnode_label is but presumably has something to do with auth.

Right. It’s part of the mandatory access control (MAC) subsystem. That is not KPI on Apple platforms, and thus not documented by us. However, a lot of the architecture is shared with other BSD platforms. If you want to learn more, check out The TrustedBSD MAC Framework.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

I'd be happy to not know it, if you know what I mean. But since NFS+O_EXCL isn't working right here, and yet, is working correctly with HFS, I just don't see why yet. HFS does have a handful of vnode_authorize() calls, of which I have zero. That is my current thinking anyway.

There is a bunch of debug stuff in MAC, is it trivial to turn on to perhaps see why it complains? Running kernel.debug 10.15.7

Thanks for replying,

OK, dtracing the debug kernel helped here, as certain functions were not inlined:

dtrace -n 'mac*:entry / execname == "nfsd" / {} ' -n 'mac*:return / execname == "nfsd" / { printf("%d %s", arg1, execname); }' -n 'nfs*:entry / execname == "nfsd" / {} ' -n 'nfs*:return / execname == "nfsd" / { printf("%d %s", arg1, execname); }' -n 'vnode*:entry / execname == "nfsd" / {} ' -n 'vnode*:return / execname == "nfsd" / { printf("%d %s", arg1, execname); }' -n 'vn*:entry / execname == "nfsd" / {} ' -n 'vn*:return / execname == "nfsd" / { printf("%d %s", arg1, execname); }'  -n 'nfsrv_rephead:entry { nd = arg0 + 0x88; tracemem(nd, 4); printf("rephead %d", arg0);}' 

Yeah, the whole thing. Anyway, it produced this hint:

  1 269114       mac_vnode_check_open:entry 
  1 269042     mac_cred_check_enforce:entry 
  1 269043    mac_cred_check_enforce:return 1 nfsd
  1 232364 vn_getpath_ext_with_mntlen:entry 
  1 231850    build_path_with_parent:return 2 nfsd
  1 232365 vn_getpath_ext_with_mntlen:return 2 nfsd
  1 232364 vn_getpath_ext_with_mntlen:entry 
  1 231850    build_path_with_parent:return 2 nfsd
  1 232365 vn_getpath_ext_with_mntlen:return 2 nfsd
  1 268848           mac_error_select:entry 
  1 268849          mac_error_select:return 2 nfsd
  1 269115      mac_vnode_check_open:return 2 nfsd

ie, it's not that mac_vnode_check_open() but rather that vn_getpath_ext_with_mntlen() fails. (Technically, build_path_with_parent()).

At the end of my zfs_vnop_create() - if I add the call:

                    vnode_update_identity(*ap->a_vpp, NULL,
                        (const char *)ap->a_cnp->cn_nameptr,
                        ap->a_cnp->cn_namelen, 0,
                        VNODE_UPDATE_NAME);

I get a successful test run using O_EXCL over nfs, finally.

I am not entirely sure why this is required, only place I call vnode_update_identity() is in vfs_vget(). Otherwise, I've "been getting away" with not calling it all this time. I assume it doesn't hurt to call it.

NFS on VFS/ZFS with open(..., O_EXCL) ?
 
 
Q