Hang in -[NSWorkspace isFilePackageAtPath:] causes MacOS to crash/kill my EndpointSecurity Client daemon

Hi.

I develop an EndpointSecurity Client (running as root in the system-domain - a launchDaemon defined in /Library/LaunchDaemons/com.company.daemon-name.plist

I register to few "AUTH" events, which I handle on a high-priority concurrent dispatch_queue, using fairly efficient code - so to ensure I meet the EndpointSecurity message deadlines. These are the event types I register for: 

ES_EVENT_TYPE_AUTH_EXEC,  
ES_EVENT_TYPE_AUTH_OPEN, 
ES_EVENT_TYPE_AUTH_CREATE,
ES_EVENT_TYPE_AUTH_CLONE,
ES_EVENT_TYPE_AUTH_RENAME,
ES_EVENT_TYPE_AUTH_EXCHANGEDATA

My daemon seems to handle events in very high loads without a hitch, usually takes round 0.1%-1% of the CPU and almost never more than 10-20MB of RAM - it's very lightweight, and works fine.

HOWEVER - on some customer Macs (Enterprise Macs with lots of IT background processes on them - antivirus packages, software-updaters, and remote-control tools, I see crash logs of my tool, occurring usually when the Mac is unattended (late night, or Mac is asleep). They all have this in common.

The crash reason:

Exception Codes:       0x0000000000000000, 0x0000000000000000

Termination Reason:    Namespace ENDPOINTSECURITY, Code 2 EndpointSecurity client terminated because it failed to respond to a message before its deadline

I've done fair statistics, and the minimum deadlines I get are ~30 seconds (LOTS OF TIME!!!) whereas my code usually takes no more than 10 milliseconds to respond to EndpointSecurity framework.

But the crash log also shows that all "working threads" (code-blocks on my Event-Handling dispatch_queue) are stuck in the same OS call - namely:

6   AppKit                        	    0x7ff80db366b4 -[NSWorkspace isFilePackageAtPath:] + 104

In most cases I have 6 or 8 such concurrent blocks pending. This is not the bottom of the stack - they all look like this:

Thread 4::  Dispatch queue: Event Handling Queue
0   libsystem_kernel.dylib        	    0x7ff809f94e0e __getattrlist + 10
1   CoreServicesInternal          	    0x7ff80cffbe98 corePropertyProviderPrepareValues(__CFURL const*, __FileCache*, __CFString const* const*, void const**, long, void const*, __CFError**) + 798
2   CoreServicesInternal          	    0x7ff80cffbb19 prepareValuesForBitmap(__CFURL const*, __FileCache*, _FilePropertyBitmap*, __CFError**) + 394
3   CoreServicesInternal          	    0x7ff80cff8421 _FSURLCopyResourcePropertyForKeyInternal(__CFURL const*, __CFString const*, void*, void*, __CFError**, unsigned char) + 277
4   CoreFoundation                	    0x7ff80a084e50 CFURLCopyResourcePropertyForKey + 96
5   CoreFoundation                	    0x7ff80a09904e -[NSURL getResourceValue:forKey:error:] + 110
6   AppKit                        	    0x7ff80db366b4 -[NSWorkspace isFilePackageAtPath:] + 104
7   ITProtector                   	       0x10ed8da73 0x10ed58000 + 219763

Bottom line is my code - which calls NSWorkspace to determine if a file I need to Authorize is a bundle or not.

My conclusion is that the call hangs forever because (Maybe?) LaunchServices, or the File-system service are busy or pushing back or very busy doing something - I don't know, and I can't reproduce on any of my Macs - As I say this randomly happens on customer Macs when they're unattended.

I now have two distinct problems using -[NSWorkspace isFilePackageAtPath:] (or any alternative I found so far).

  • The call is usually very fast - but now I can't know in advance how much time it will take.
  • The call and all its alternatives - are synchronous. Haven't found asynchronous replacement I could call, to introduce my own "timeouts" on the issue.

I need help here - what can cause this API to hang for over a minute? Where to start looking? last - What activity of the OS can impose such long block on this fairly basic File-system query?

Any idea or suggestion or lead would be greatly appreciated.

Thanks!

Replies

Where to start looking?

You need a spin dump. The best way to get that is to trigger a sysdiagnose log immediately after your process has been killed in this way. That’s going to be hard to do by hand in your case, but you can probably create some sort of automation to do it. Assuming your have a coöperative user who’ll run that automation for you.

what can cause this API to hang for over a minute?

Given the context you described, there’s probably some other tool that’s deadlocking against your ES client (until the system steps in and breaks the deadlock by killing your ES client). I regularly see dualling ES clients is enterprise environments like this.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

  • Thank you very much. Fortunately I have a very cooperative customer with someone eager to help resolve the issue. However - he sees the problem the next morning. It happens when he's home, and his office-Mac is asleep. Can sysdiagnose help if it's run hours after the issue happened?

  • I looked again into my code just to be sure. The first thing in my ES message-handling code-block is:

        if (msg->process->is_es_client) return YES; // Immediately authorize this message - clear any other ES-client process

    So actual "deadlocking" by another ES client is not likely.

Add a Comment

@eskimo

OK... got a huge (~380MB) archive of sysdiagnose, taken about 1 minute after my ES client was killed by OS. What now? what should I be looking after - How to even open/understand this huge archive?

@suMac ... I am facing exactly same issue with ForcePoint ESDaemonBundle process... It not only crashes the daemon process but the Mac OS also getting crashed and restarted. Did you find solution to your problem?

  • Sad to say I didn't. I found a workaround though.

    I added a concurrent operation-queue on which I perform the calls to [NSWorkspac isFilePackageAtPath:] I also Implemented an "Asynchronous NSOperation" for my main ES evaluation, that supports a timeout (based on the ES deadlines).

    My AUTH event handler asynchronously dispatches [NSWorkspac isFilePackageAtPath:] and blocks awaiting result - up to my timeout.Now my ES client is never kicked.

Add a Comment