Hi.
I develop an EndpointSecurity Client (running as root in the system-domain - a launchDaemon defined in /Library/LaunchDaemons/com.company.daemon-name.plist
I register to few "AUTH" events, which I handle on a high-priority concurrent dispatch_queue, using fairly efficient code - so to ensure I meet the EndpointSecurity message deadlines. These are the event types I register for:
ES_EVENT_TYPE_AUTH_EXEC,
ES_EVENT_TYPE_AUTH_OPEN,
ES_EVENT_TYPE_AUTH_CREATE,
ES_EVENT_TYPE_AUTH_CLONE,
ES_EVENT_TYPE_AUTH_RENAME,
ES_EVENT_TYPE_AUTH_EXCHANGEDATA
My daemon seems to handle events in very high loads without a hitch, usually takes round 0.1%-1% of the CPU and almost never more than 10-20MB of RAM - it's very lightweight, and works fine.
HOWEVER - on some customer Macs (Enterprise Macs with lots of IT background processes on them - antivirus packages, software-updaters, and remote-control tools, I see crash logs of my tool, occurring usually when the Mac is unattended (late night, or Mac is asleep). They all have this in common.
The crash reason:
Exception Codes: 0x0000000000000000, 0x0000000000000000
Termination Reason: Namespace ENDPOINTSECURITY, Code 2 EndpointSecurity client terminated because it failed to respond to a message before its deadline
I've done fair statistics, and the minimum deadlines I get are ~30 seconds (LOTS OF TIME!!!) whereas my code usually takes no more than 10 milliseconds to respond to EndpointSecurity framework.
But the crash log also shows that all "working threads" (code-blocks on my Event-Handling dispatch_queue) are stuck in the same OS call - namely:
6 AppKit 0x7ff80db366b4 -[NSWorkspace isFilePackageAtPath:] + 104
In most cases I have 6 or 8 such concurrent blocks pending. This is not the bottom of the stack - they all look like this:
Thread 4:: Dispatch queue: Event Handling Queue
0 libsystem_kernel.dylib 0x7ff809f94e0e __getattrlist + 10
1 CoreServicesInternal 0x7ff80cffbe98 corePropertyProviderPrepareValues(__CFURL const*, __FileCache*, __CFString const* const*, void const**, long, void const*, __CFError**) + 798
2 CoreServicesInternal 0x7ff80cffbb19 prepareValuesForBitmap(__CFURL const*, __FileCache*, _FilePropertyBitmap*, __CFError**) + 394
3 CoreServicesInternal 0x7ff80cff8421 _FSURLCopyResourcePropertyForKeyInternal(__CFURL const*, __CFString const*, void*, void*, __CFError**, unsigned char) + 277
4 CoreFoundation 0x7ff80a084e50 CFURLCopyResourcePropertyForKey + 96
5 CoreFoundation 0x7ff80a09904e -[NSURL getResourceValue:forKey:error:] + 110
6 AppKit 0x7ff80db366b4 -[NSWorkspace isFilePackageAtPath:] + 104
7 ITProtector 0x10ed8da73 0x10ed58000 + 219763
Bottom line is my code - which calls NSWorkspace to determine if a file I need to Authorize is a bundle or not.
My conclusion is that the call hangs forever because (Maybe?) LaunchServices, or the File-system service are busy or pushing back or very busy doing something - I don't know, and I can't reproduce on any of my Macs - As I say this randomly happens on customer Macs when they're unattended.
I now have two distinct problems using -[NSWorkspace isFilePackageAtPath:] (or any alternative I found so far).
- The call is usually very fast - but now I can't know in advance how much time it will take.
- The call and all its alternatives - are synchronous. Haven't found asynchronous replacement I could call, to introduce my own "timeouts" on the issue.
I need help here - what can cause this API to hang for over a minute? Where to start looking? last - What activity of the OS can impose such long block on this fairly basic File-system query?
Any idea or suggestion or lead would be greatly appreciated.
Thanks!