I have 2 EndpointSecurity apps. App1 is doing the heavy lifting and processes multiple events(
(please note that none of the apps is a System Extension).
Sometimes when App1 is runing and App2 executes, the system hangs. After few days of investigation, it seems that starting a new es client (even without registering any messages) causes the system to block all operations until all the currently pending auth requests from other clients are answered.
This makes it very hard to reason about the application. It also requires to basically offload everything to background threads and do every FileSystem related operation with a timer (even operations on NON_BLOCKING filedescriptors) so we can guarantee to answer in time; not doing so leads to endpointsecurityd eventually killing all endpoint security apps when it detects a stuck app.
I want to note that typically for me it is necessary to run the hepler app in a loop to induce this; there may be some other race condition happening at the same time. But originally a single execution of this seemed to be enough to cause the hang in rare circumstances.
Please is this an intended feature? And if yes, is there a way how to detect such an event ("you have to answer all pending requests immediately while basically not being able to do anything") from the main app? Because right now it seems that starting a new endpoint security client may lead to temporary hang of the system…
Code Block ES_EVENT_TYPE_AUTH_EXEC, ES_EVENT_TYPE_AUTH_OPEN, ES_EVENT_TYPE_AUTH_RENAME, ES_EVENT_TYPE_AUTH_UNLINK, ES_EVENT_TYPE_NOTIFY_CLOSE, ES_EVENT_TYPE_NOTIFY_CREATE
). App2 is responsible for checking whether Full Disk Access is granted or not:Code Block int main(int argc, char * argv[]) { es_client_t *client; auto res = es_new_client(&client, ^(es_client_t *clt, const es_message_t *msg) { }); _exit(res == ES_NEW_CLIENT_RESULT_ERR_NOT_PERMITTED ? 0 : 1); }
(please note that none of the apps is a System Extension).
Sometimes when App1 is runing and App2 executes, the system hangs. After few days of investigation, it seems that starting a new es client (even without registering any messages) causes the system to block all operations until all the currently pending auth requests from other clients are answered.
This makes it very hard to reason about the application. It also requires to basically offload everything to background threads and do every FileSystem related operation with a timer (even operations on NON_BLOCKING filedescriptors) so we can guarantee to answer in time; not doing so leads to endpointsecurityd eventually killing all endpoint security apps when it detects a stuck app.
I want to note that typically for me it is necessary to run the hepler app in a loop to induce this; there may be some other race condition happening at the same time. But originally a single execution of this seemed to be enough to cause the hang in rare circumstances.
Please is this an intended feature? And if yes, is there a way how to detect such an event ("you have to answer all pending requests immediately while basically not being able to do anything") from the main app? Because right now it seems that starting a new endpoint security client may lead to temporary hang of the system…