Ever since 10.15.5 (I think it was) brought in the new proc_lock_ APIs it has been quite easy to deadlock namei() lookups and mount at the same time.
Stack 1
*1000 unix_syscall64 + 698 (kernel.development + 9558170) [0xffffff8000b1d89a]
*1000 lstat64 + 47 (kernel.development + 4947279) [0xffffff80006b7d4f]
*1000 fstatat_internal + 327 (kernel.development + 4944567) [0xffffff80006b72b7]
*1000 nameiat + 117 (kernel.development + 4919557) [0xffffff80006b1105]
*1000 namei + 3857 (kernel.development + 4813841) [0xffffff8000697411]
*1000 lookup + 1842 (kernel.development + 4817810) [0xffffff8000698392]
*1000 lookup_handle_found_vnode + 677 (kernel.development + 4814677) [0xffffff8000697755]
*1000 vfs_busy + 79 (kernel.development + 4847775) [0xffffff800069f89f]
*1000 IORWLockRead + 738 (kernel.development + 3527154) [0xffffff800055d1f2]
Stack 2
1000 mount + 10 (libsystem_kernel.dylib + 41114) [0x7fff72fc109a]
*1000 hndl_unix_scall64 + 22 (kernel.development + 1622534) [0xfffff
f800038c206]
*1000 unix_syscall64 + 698 (kernel.development + 9558170) [0xfffff
f8000b1d89a]
*1000 mount + 78 (kernel.development + 4901838) [0xffffff80006ac
bce]
*1000 __mac_mount + 1330 (kernel.development + 4903186) [0xfff
fff80006ad112]
*1000 mount_common + 4860 (kernel.development + 4897964) [0xffffff80006abcac]
*1000 checkdirs + 115 (kernel.development + 4901059) [0xffffff80006ac8c3]
*1000 proc_iterate + 892 (kernel.development + 8110892) [0xffffff80009bc32c]
*1000 checkdirs_callback + 139 (kernel.development + 4901547) [0xffffff80006acaab]
*1000 IORWLockWrite + 1240 (kernel.development + 3528664) [0xffffff800055d7d8]
The mount call will vfs_busy()
then wait for proc_dirs_lock_exclusive() (IORWLockWrite).
Whereas stat
will grab proc_dirs_lock_share() in namei(), then because it needs to cross mountpoint, it calls lookup_traverse_mountpoints()
which calls vfs_busy()
.
Classic A-B, B-A deadlock.
Having a hard to time to 1) avoid it, or 2) detect it will happen, since everything is opaque, settings like NOCROSSMNT is not something I can set.