Curious spurious ENOENT errors from chown -R

This bug report is from Catalina, but we have confirmed it happens in BigSur as well, it is just tedious to do kext work in BigSur.

The following process:
Code Block
# zpool create mypool disk1
# chown -R lundman /Volumes/mypool
chown: /Volumes/mypool/.Spotlight-V100/Store-V2: No such file or directory
chown: /Volumes/mypool/.Spotlight-V100/VolumeConfiguration.plist: No such file or directory
chown: /Volumes/mypool/.fseventsd: No such file or directory


Create a new filesystem, mount, try to chown -R and get errors. The names of files that error stay the same for subsequent chown runs, but different may fail if I re-create the filesystem.

Then do:

Code Block
# ssh localhost
# chown -R lundman /Volumes/mypool


So ssh to the exact same machine, and chown runs fine.
  • It does something differently if I'm on the UI, vs, if I'm ssh'ed in (ssh on same UI or remote, ssh fixes it).

The errored files stat just fine, and you can chown it just fine. (without -R). Even after doing a working chown -R over ssh, the UI chown -R will still fail.

Digging as deep as I can with dtrace, I have traced it to
Code Block
lookup:return 2 chown
namei:return 2 chown
vn_open_auth:return 2 chown


So it isn't even reaching VNOP_LOOKUP() in my filesystem yet. (But perhaps readdir could be returning something bad?)

So triggering a panic when it is about to return ENOENT:

dtrace -** 'lookup:return {printf("%d %s", arg1,execname); if (execname =="chown" && arg1 == 2 && val++ == 10) { printf("This one"); panic()}}'

Code Block
: mach_kernel : trap_from_kernel + 0x26
: mach_kernel : _lookup + 0x208
: mach_kernel : _namei + 0xea6
: mach_kernel : _nameiat + 0x75
: mach_kernel : _fstatat_internal + 0x147
: mach_kernel : _stat64 + 0x2f


Code Block
frame #13: 0xffffff800489ff88 kernel.development`lookup(ndp=<unavailable>) at vfs_lookup.c:1457:1 [opt]
(lldb) p *ndp
(nameidata) $1 = {
ni_dirp = 140556031248840
ni_segflg = UIO_USERSPACE64
ni_op = OP_SETATTR
ni_startdir = 0x0000000000000000
ni_rootdir = 0xffffff801f23d700
ni_usedvp = 0x0000000000000000
ni_vp = 0x0000000000000000
ni_dvp = 0xffffff801f552700
ni_pathlen = 1
ni_next = 0xffffff8077d4bc1a <no value available>
ni_pathbuf = {
[0] = '.'
[1] = 'f'
[2] = 's'
[3] = 'e'
[4] = 'v'
[5] = 'e'
[6] = 'n'
[7] = 't'
[8] = 's'
[9] = 'd'
[10] = '\0'
[255] = '\0'
}
ni_loopcnt = 0
ni_cnd = {
cn_nameiop = 0
cn_flags = 1097792
cn_context = 0xffffff80262c2120
cn_ndp = 0xffffff8077d4bbc8
cn_pnbuf = 0xffffff8077d4bc10 ".fseventsd"
cn_pnlen = 256
cn_nameptr = 0xffffff8077d4bc10 ".fseventsd"
cn_namelen = 10
cn_hash = 1753311157
cn_consume = 0
}
ni_flag = 0
ni_ncgeneration = 0
}
(lldb) p *ndp->ni_cnd.cn_context
(vfs_context) $2 = {
vc_thread = 0xffffff80206b8550
vc_ucred = 0xffffff80254d1490
(lldb) p *ndp->ni_dvp
v_name = 0xffffff801f23b500 "Volumes"
(lldb) frame variable
(int) wantparent = 6
(int) docache = 1


Nothing stands out to my green eyes, but it is annoying that I can not see most variables. It is time to boot
kernel.debug instead.

But unfortunately, the chown -R does not happen with booting kernel.debug! D'oh. Tested re-creating and running chown -R 4 times before it had a panic with
xnu_debug/xnu-6153.101.5/osfmk/kern/thread.c:2535 Assertion failed: io_tier < IO_NUM_PRIORITIES called from _apfs_vnop_strategy() - probably unrelated.

Don't think I've come across a problem with my filesystem that changed depending on if I had ssh'ed in. Using UI vs ssh presumably changes context? But it must be related to my code, since it doesn't happen with hfs.





Replies

Oh huh, I *assumed* it does not fail on hfs and apfs, but turns out it does. And ssh localhost will make it not-fail. I get
Operation not permitted
  • both hfs and apfs can give chown -R error.

  • both hfs and apfs will not give error if you ssh localhost.


Oh ah, i see. If I add "Full Disk Access" to "Terminal" (sshd already has it) then it works. That isn't all that exciting then.

Ah well, on to next mystery...