Endpoint security daemon causing shutdown hang

I have an application that leverages endpoint security which runs a launchd service. It protects itself from user termination by intercepting signals addresses to the daemon and only allowing a whitelisted group of processes (including launchd) send signals. Launchd is set to keep the process alive. When I shut down the machine, I get a long hang similar to when an endpoint application is deadlocked on when processing an event who causes an auth event to be created for the target process. There is nothing interesting to speak of in the unified logs. I suspect I might be shutting down the endpoint framework in the wrong order or something when I intercept the signals delivered by the system shutdown. I have tried various combinations of allowing and denying the kill-my-daemon signal from launchd (I never do see the sigterm, but I do see the sigkill) with varying levels of pauses and hangs, but I only get a quick shutdown on a few occasions. I do mute notifications for my daemon so as to not deadlock myself. I’m thinking there is a better way to appproach this problem, but the method escapes me. How to best have my process protect itself from users terminating it while still having the os shut down quickly without getting locked waiting on notifications to expire for a killed process... I very much appreciate the community’s insights. These are some interesting problems caused by doing this work in user space!!!

Replies

I found a terrible solution that works for us at this time. I'm thinking this is a gap that should be addresssed in the endpoint framework. Essentially if you create a daemon that uses the framework which runs under launchd, the app will get terimated on shutdown with no opportunity to unsubscribe, leaving the system hanging waiting on the app to respond. This is the message you will see in the log if you are fortunate enough to not completely deadlock the system:

---------------------------------------------

020-02-06 23:49:40.229877-0500 localhost kernel[0]: (EndpointSecurity) Client did not respond in appropriate amount of time (client pid: 121)

2020-02-06 23:49:40.229924-0500 localhost kernel[0]: (EndpointSecurity) Client did not respond in appropriate amount of time (client pid: 121)

---------------------------------------------


My hacky solution is to intercept successful NOTIFY_EXIT events for a service that shuts down with the system and unsubscribe when I see that. This is likely to break between OS releases, so I very much appreciate advice on a longer-term fix...

---------------------------------------------


if(msg->event_type == ES_EVENT_TYPE_NOTIFY_EXIT &&

msg->process->is_platform_binary &&

[@"com.apple.kextd" isEqualToString:esstring_to_nsstring(&msg->process->signing_id)] &&

msg->event.exit.stat == 0) {

es_unsubscribe_all(g_client);

LOG_INFO("Saw kextd exit");

}

Hi,


Can you share the code that you use to protect your app?


Thanks,

Rony.

Here is the gist of it...


```

case ES_EVENT_TYPE_AUTH_SIGNAL:

if(msg->process->is_platform_binary &&

[@"com.apple.xpc.launchd" isEqualToString:esstring_to_nsstring(&msg->process->signing_id)]) {

return ES_AUTH_RESULT_ALLOW;

}

if([[NSProcessInfo processInfo] processIdentifier] == audit_token_to_pid(msg->event.signal.target->audit_token)) {

return ES_AUTH_RESULT_DENY;

}


return ES_AUTH_RESULT_ALLOW;

```

Essentially if you create a daemon that uses the framework which runs under

launchd
, the app will get terimated on shutdown with no opportunity to unsubscribe, leaving the system hanging waiting on the app to respond.

What do you mean by “the app” in this context?

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

Hi eskimo. By "the app", I mean the user-space process that is receiving and responding to endpoint security events.

Hmmm, I’m still confused. With reference to the quote in my 14 Feb post, you use two terms, “daemon” and “app”. Are they different things? Or just two terms for the same thing?

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"