Of course, but that is not what is happening here. At most one file is being created, a .DS_Store file. Having encountered this problem before in a strictly local context, I changed my program to retry and that has worked (with local files).
If the problem is APFS delaying the deletion of the directory, then perhaps I need a longer timeout on my retry loop?
To clarify, I don't think APFS itself is a key factor. How long is your current retry timing? Certainly comparing local and network volumes, much longer timeouts are often required for a network file system.
The real problem is not the failure to delete the directory, it is inconsistent state that is created (as viewed by a network client).
What does "viewed" here actually mean? How the user actually "looking at" the file/directory?
The complication here is around exactly happened here:
Apparently, the file server believes the directory exists (it shows up when remotely listing the parent) even after the directory ceases to exist (as viewed on the server file system).
The question here is which of these actually happened:
1)The server return old data when the client asked.
OR
2)The client displayed old data and never asked the server.
Both of those a possible, but I'd generally consider #2 more likely than #1. The server has relatively "free" access* to the underlying data, so it's easy for it to just check the disk and see what's there. By contrast, the client is ACTIVELY trying to avoid server communication, which creates "space" for this kind of failure. Note that #2 doesn't necessarily mean the problem is the client. There's a complicated architecture here where the client is supposed to server about directories it's "intersted in" and the server is supposed to tell the client about changes to those location(s)., so failure is possible in both components.
That leads back to an earlier issue here:
Using Unix tools on the client to view or delete the directory fails. This failure still occurs even now.
The big thing that still isn't clear to me is the difference between:
- Part of the system (your app, the terminal, etc.) is seeing weird/old data.
vs
- The "entire" system is seeing weird/old data, particularly the Finder and ESPECIALLY across mount/unmount cycle.
I can see how the first case might happen depending on EXACTLY how the directory is being examined and manipulated. Whatever is viewing that data ends up in state where the client has bad data AND the server doesn't know that the client is "interested" in that directory. The client keeps showing the bad data and the server never realizes there was any reason to update the client.
The second case is much, much stranger. Unmounting reset the clients "world" and, more importantly, I expect it to basically reset the server as well.
Moving to here:
The directories are not in the home directory. They are on an external SSD volume.
What's the file system on the SSD? APFS, HFS+, or something else?
At the very least, when the network client tries to list or delete this directory,
Yes and no. On the listing side, no, not necessarily. It may just be returning it local data and relying on the server to notify it of changes (which it should be). On the delete side... what's the specific command/function your executing? It's possible it's effectively doing an "ls" first, which menas it would fail in exactly the same way.
the file server should notice that the directory no longer exists, update its state and return a nonexistent file error. Instead, it returns a permissions error but does not update its state, so any future attempt to list or delete the directory will get the same error.
As I mentioned above, I would not assume that the server is the failure point here.
I probably will modify my program to move the application before trying to delete it. I don't like the idea of adding macOS specific code to an application that is currently not OS specific, but that may be the only option. I believe that macOS specific code is needed to find a location (trash) where the Finder (or APFS) will not continue to try to calculate the application size. (Does moving the application to the trash stop APFS from recalculating its size?)
Clarifying here, I don't think the size calculation is a factor here at all. Fast directory sizing is part of APFS internal implementation and won't have any effect the kind of issue you're describing. I only mentioned it to explain how the Finder could be getting an accurate directory size without a bunch of extra directory iterations.
Also, is the Finder actively open and monitoring these directories while all this is happening? The Finder doesn't normally create .DS_Store file for directories it isn't actually interacting with.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware