es_mute_process fails intermittently

Hi,
I am monitoring ES_EVENT_TYPE_AUTH_OPEN
events, however I am interested in only a few processes (list is not hardcoded, it is configurable).

So I am using es_mute_process to mute most of the processes from ES_EVENT_TYPE_AUTH_OPEN event callback.

After some time, es_mute_process calls start failing.
On checking currently muted process count using es_muted_processes, it is always 255.

Is there an upper limit on number of processes that can be muted?

If there is a limit but if I keep on trying to mute processes even after the limit, as slots might become available if some of the processes exit, would that cause any impact on performance?
Am I supposed to detect the error and may be stop muting processes for some time?
Answered by Security Engineer in 620147022

Is there an upper limit on number of processes that can be muted?

Yes. We increased the limit substantially in 10.15.6 and this can be tested in the latest betas.

If there is a limit but if I keep on trying to mute processes even after the limit, as slots might become available if some of the processes exit, would that cause any impact on performance?

I would expect the impact to be minimal. There is a non-zero cost to attempting to mute a process as this call has to enter the kernel, but it is unlikely to be more than most syscalls.

And just to confirm as you noted - ES will automatically remove items from the set of muted processes as they exit, there is nothing for your client to do manually.

Am I supposed to detect the error and may be stop muting processes for some time?

It isn't strictly necessary to stop attempting to mute - just know that it can fail for various reason, mainly reaching the limit.

After some time, es_mute_process calls start failing.

What does that failure look like? Do you get some sort of error result? Or does the call succeed but then the process isn’t actually muted?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
Return value from the function is not ES_RETURN_SUCCESS.
I tried filtering console logs by endpointsecurityd. There are no additional logs when mute fails.

I can reproduce this as follows. I have to leave it running and keep on launching processes. e.g. I launch Simulator (launch Safari inside it) from Xcode which seems to launch a lot of processes.

Code Block
//subscribed to only ES_EVENT_TYPE_AUTH_OPEN
 es_new_client(&client, ^(es_client_t *, const es_message_t * message) {
es_message_t *msg = es_copy_message(message);
dispatch_async(dispatch_get_main_queue(), ^{
uint32_t fflags = 0xffffffff;
es_respond_flags_result(client, msg, fflags, true);
es_return_t res = es_mute_process(client, &msg->process->audit_token);
if (res != ES_RETURN_SUCCESS)
std::cout << "mute failed " << res << std::endl;
es_free_message(msg);
});



With following I was not able to reproduce the issue but the issue is intermittent so can't say for sure that it doesn't.

In a simple program I tried listing all processes and muted all (close to 400). There I don't see the error.
Ran this periodically at 1 sec interval, still no error.

Most processes for which I see error (some listed below) seem to be short lived.
So I also tried a simple program where
  • in open callback check if process is, say, Safari

  • copy its audit token from message

  • kill the process

  • in async dispatch block wait for 2 sec then mute the process

Even this succeeds.


Some of the processes for which I see mute failure, there are many more
/bin/bash
/usr/bin/su
/usr/bin/security
/usr/bin/dscl
/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Library/Developer/CoreSimulator/Profiles/Runtimes/iOS.simruntime/Contents/Resources/RuntimeRoot/usr/libexec/nsurlstoraged
/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Library/Developer/CoreSimulator/Profiles/Runtimes/iOS.simruntime/Contents/Resources/RuntimeRoot/System/Library/PrivateFrameworks/TCC.framework/tccd
/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Library/Developer/CoreSimulator/Profiles/Runtimes/iOS.simruntime/Contents/Resources/RuntimeRoot/System/Library/Frameworks/Security.framework/CircleJoinRequested/CircleJoinRequested

Return value from the function is not ES_RETURN_SUCCESS.

So what is it?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
It is always 1 (ES_RETURN_ERROR).
These are the only two possible values documented for es_return_t, return type of es_mute_process function.

These are the only two possible values documented for es_return_t

Indeed.

Unfortunately the code inside es_mute_process that converts the underlying error to an es_return_t doesn’t log that error, so there’s no easy way to discover it.

If you’re up for some assembly-level debugging, you could step into es_mute_process and look for the result from IOConnectCallStructMethod:

Code Block
(lldb) disas -n es_mute_process
libEndpointSecurity.dylib`es_mute_process:
… <+0>: pushq %rbp
… <+1>: movq %rsp, %rbp
… <+4>: movq %rsi, %rdx
… <+7>: movl (%rdi), %edi
… <+9>: movl $0x20, %ecx
… <+14>: movl $0x4, %esi
… <+19>: xorl %r8d, %r8d
… <+22>: xorl %r9d, %r9d
… <+25>: callq 0xbec8 ; symbol stub for: IOConnectCallStructMethod
… <+30>: xorl %ecx, %ecx
… <+32>: testl %eax, %eax
… <+34>: setne %cl
… <+37>: movl %ecx, %eax
… <+39>: popq %rbp
… <+40>: retq


Alternatively, you could look at the system log to see if the kernel side logs more about the error.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
Thanks for the reply.
Return value from IOConnectCallStructMethod on error is 0xE00002BC which also I suppose is generic error.

I think it maps to following in xnu code but I could be wrong
#define kIOReturnError iokit_common_err(0x2bc) // general error

Could not see any relevant logs (for kernel process or otherwise) in Console even with Debug and Info enabled.
There are not many logs from endpointsecurityd as well. Can you suggest any specific process, log that I can look for?
Accepted Answer

Is there an upper limit on number of processes that can be muted?

Yes. We increased the limit substantially in 10.15.6 and this can be tested in the latest betas.

If there is a limit but if I keep on trying to mute processes even after the limit, as slots might become available if some of the processes exit, would that cause any impact on performance?

I would expect the impact to be minimal. There is a non-zero cost to attempting to mute a process as this call has to enter the kernel, but it is unlikely to be more than most syscalls.

And just to confirm as you noted - ES will automatically remove items from the set of muted processes as they exit, there is nothing for your client to do manually.

Am I supposed to detect the error and may be stop muting processes for some time?

It isn't strictly necessary to stop attempting to mute - just know that it can fail for various reason, mainly reaching the limit.
Thanks for the information. This is helpful. I will check on 10.15.6 (haven't upgraded yet).

This implies muting a process during process exec is not enough as mute may fail and I would still see open calls from such processes. So filtering and muting from open auth callback is also required.

Does es_mute_path_literal also have such limits?
WWDC session "Build an Endpoint Security app" said not to use too many paths in es_mute_path_literal.
So is it a bad idea to use it if I am muting most processes from which I see open auth events?

What I mean is if mute by audit token fails then I anyway
  • keep on receiving open auth events

  • I filter those based on process path and respond allow

Would it be better to use es_mute_path_literal in such cases?


Does es_mute_path_literal also have such limits?

Nothing is infinite :). Muting by paths is a very different mechanism than the conceptual "mute by set of audit tokens". The limits are harder to define.

So is it a bad idea to use it if I am muting most processes from which I see open auth events?

Not at all! The comment from the session is not meant to discourage usage of the API. But if you find yourself muting tons of paths, it may indicate that your design should be reconsidered. The API, when used appropriately can be a very useful tool. You'll need to ensure that you're testing appropriately against both expected and peak event volumes to ensure performance tradeoffs are accounted for.

Would it be better to use es_mute_path_literal in such cases?

Potentially yeah. If you have a known set of paths (or path prefixes) that you will always want to mute, you might see a performance benefit to muting by paths instead of waiting for arbitrary EXEC events, seeing if that process matches some rule you've defined, and then muting by audit token.


Thanks for all the details. I will try out the mute by path and as you suggested will measure the impact in different volume conditions.
As informed by Security Engineer, not seeing the error from mute process on 10.15.6 (even with Simulator and lots of other processes running).
es_mute_process fails intermittently
 
 
Q