When recursively removing a directory with a large number of entries that resides on my custom filesystem, rm(1) aborts with ENOENT.
% rm -Rv /Volumes/myfs/linux-kernel/linux-6.10.6
[...]
/Volumes/myfs/linux-kernel/linux-6.10.6/include/drm/bridge/aux-bridge.h
/Volumes/myfs/linux-kernel/linux-6.10.6/include/drm/bridge/dw_hdmi.h
rm: fts_read: No such file or directory
I'm observing the following sequence of calls being made.
2024-09-17 17:58:25 vnops.c:281 myfs_vnop_lookup: rm-936 -> dw_hdmi.h ;initial lookup call
2024-09-17 17:58:25 vnops.c:315 myfs_vnop_lookup: -> cache_lookup(dw_hdmi.h)
2024-09-17 17:58:25 vnops.c:317 myfs_vnop_lookup: <- cache_lookup(dw_hdmi.h) -> 0 ;cache miss
2024-09-17 17:58:25 rpc.c:431 myfsLookup: rm-936 -> dw_hdmi.h ;do remote lookup
2024-09-17 17:58:25 rpc.c:500 myfsLookup: -> myfs_lookup_rpc(dw_hdmi.h)
2024-09-17 17:58:25 rpc.c:502 myfsLookup: <- myfs_lookup_rpc(dw_hdmi.h) -> 0 ;file found and added to vfs cache
2024-09-17 17:58:25 vnops.c:281 myfs_vnop_lookup: rm-936 -> dw_hdmi.h ;subsequent lookup call
2024-09-17 17:58:25 vnops.c:315 myfs_vnop_lookup: -> cache_lookup(dw_hdmi.h)
2024-09-17 17:58:25 vnops.c:317 myfs_vnop_lookup: <- cache_lookup(dw_hdmi.h) -> -1 ;cache hit
2024-09-17 17:58:25 vnops.c:1478 myfs_vnop_remove: -> myfs_unlink(dw_hdmi.h) ;unlink sequence
2024-09-17 17:58:25 rpc.c:1992 myfs_unlink: -> myfs_unlink_rpc(dw_hdmi.h)
2024-09-17 17:58:25 rpc.c:1994 myfs_unlink: <- myfs_unlink_rpc(dw_hdmi.h) -> 0 ;remote unlink succeeded
2024-09-17 17:58:25 vnops.c:1480 myfs_vnop_remove: <- myfs_unlink(dw_hdmi.h) -> 0
2024-09-17 17:58:25 vnops.c:1487 myfs_vnop_remove: -> cache_purge(dw_hdmi.h)
2024-09-17 17:58:25 vnops.c:1489 myfs_vnop_remove: <- cache_purge(dw_hdmi.h)
2024-09-17 17:58:25 vnops.c:1499 myfs_vnop_remove: -> vnode_recycle(dw_hdmi.h)
2024-09-17 17:58:25 vnops.c:1501 myfs_vnop_remove: <- vnode_recycle(dw_hdmi.h)
2024-09-17 17:58:27 vnops.c:281 myfs_vnop_lookup: fseventsd-101 -> dw_hdmi.h ;another lookup; why?
2024-09-17 17:58:27 vnops.c:315 myfs_vnop_lookup: -> cache_lookup(dw_hdmi.h)
2024-09-17 17:58:27 vnops.c:317 myfs_vnop_lookup: <- cache_lookup(dw_hdmi.h) -> 0
2024-09-17 17:58:27 rpc.c:431 myfsLookup: fseventsd-101 -> dw_hdmi.h
2024-09-17 17:58:27 rpc.c:500 myfsLookup: -> myfs_lookup_rpc(dw_hdmi.h)
2024-09-17 17:58:27 rpc.c:502 myfsLookup: <- myfs_lookup_rpc(dw_hdmi.h) -> ENOENT(2)
2024-09-17 17:58:27 vnops.c:371 myfs_vnop_lookup: SET(NNEGNCENTRIES): dw_hdmi.h
2024-09-17 17:58:27 vnops.c:373 myfs_vnop_lookup: ENOENT(2) <- shouldAddToNegativeNameCache(dw_hdmi.h)
I checked the value of vnode's v_iocount when vnop_remove and vnop_reclaim are being called.
Each vnop_remove and followed by vnop_reclaim with v_iocount set to 1 in both calls, as expected.
What I don't understand is why after removing the file is there another lookup call being made, which returns ENOENT to rm(1), which causes it to abort.
Any pointers on what could be amiss there would be much appreciated.
When recursively removing a directory with a large number of entries that resides on my custom filesystem, rm(1) aborts with ENOENT.
My psychic debugging sense™ says that your custom file systems backing store is another "standard" file system and you implemented VNOP_REMOVE by simply deleting the target file/directory.
The problem with that approach is that it simply isn't how any Unix file system operates. Unix allows the deletion of open objects, which means it's entirely valid to have (or lookup) a deleted object.
I checked the value of vnode's v_iocount when vnop_remove and vnop_reclaim are being called.
What about "v_usecount" (and "v_kusecount")? Those are what track actual references to the vnode.
What I don't understand is why after removing the file is there another lookup call being made, which returns ENOENT to rm(1), which causes it to abort.
Any pointers on what could be amiss there would be much appreciated.
Two major recommendations:
-
Have you directly tested the basic behavior of the underlying system calls like "unlink", "rmdir", etc? Both in the simple case (for example, create a file, delete the file) and in more complicated cases (for example, create file, open file, remove file, close file)? Many developers get something running, typically through something like FUSE, and then immediately "jump" higher level functionality like command line tools or the Finder without ever verifying that the "basic" systems calls actually work.
-
If your basic testing works fine, then you can find the source to "rm" in the "file_cmds" project, so you can build and test it yourself.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware