How to identify a user who performed action which is reported by Endpoint Security

Hello, My application monitors ES_EVENT_TYPE_NOTIFY_CLOSE. If a file is dragged to another location in Finder, the Endpoint Security reports the event ES_EVENT_TYPE_NOTIFY_CLOSE was performed by '/usr/libexec/xpcproxy'. So, xpcproxy is the process that performed ES_EVENT_TYPE_NOTIFY_CLOSE. Looks like the dragged file is copied by some XPC service.

I have found the audit user id is equal to user who dragged a file.

Can audit user id be used to identify a user who triggers copy file action in this case? If no, are there any way to define such info?

Thank you in advance!

Answered by ForumsContributor in

Hello, My application monitors ES_EVENT_TYPE_NOTIFY_CLOSE. If a file is dragged to another location in Finder, the Endpoint Security reports the event ES_EVENT_TYPE_NOTIFY_CLOSE was performed by '/usr/libexec/xpcproxy'. So, xpcproxy is the process that performed ES_EVENT_TYPE_NOTIFY_CLOSE.

Sort of. xpcproxy is basically the "starter process" that XPC services are initialized from, however, it should be very quickly replaced by the "real" service (which does any actual work) and that "real" service is what I'd expect most events to be ascribed to.

I think what might have happened here is actually the the XPC service never actually closed the file (probably intentionally, as there's no reason to if you're going to exit), so the close actually happened during process destruction. That also meant that the activity was ascribed to "xpcproxy", when the "real" source was the original service

Looks like the dragged file is copied by some XPC service.

Yes, probably "DesktopServicesHelper", which is what does most of the Finder "real" file work (copies, emptying trash, etc).

I have found the audit user id is equal to user who dragged a file.

Can audit user id be used to identify a user who triggers copy file action in this case?

If no, are there any way to define such info?

First off, please go read "Inferring High-Level Semantics from Low-Level Operations". The kind of inference you're trying to make here is very common (and nearly unavoidable) in ES clients, but it's critical that you're aware that you are in fact making inferences based on the data available, not getting true "answers".

So, getting into specifics, your ES client is given two different audit tokens:

parent_audit_token-> This is the direct parent of the source process. For most app or XPC service launches, this will be launchd/pid 1. In practice, most of the processes on a system actually end up getting launched this way, which means it's not actually that interesting.

NOTE: If you're going to look at parent_audit_token, one trivial optimization is to check for launchd by comparing ppid (or original_ppid, depending on what you're interested in) to "1". PID reuse means this kind of comparison is not safe in the "general" case (this is why audit tokens exist), but is safe in the specific case of launchd. launchd will ALWAYS be pid 1 and pid 1 will ALWAYS be launchd*.

*Architecturally, launchd is pid 1 because the kernel doesn't create it through fork (you can't fork if no process exists) but instead creates it "manually" constructing it's process structure and then "directly" starting it's execution through the scheduler. This entire process is a "one time" initialization that happens early in the boot process and cannot really be "repeated". If launchd terminates, the kernel makes no attempt to recover from that but instead just reboots the entire system, making launchd exit'ing the functional equivalent of a restart.

responsible_audit_token-> This is the process that originally "asked" launchd to create the process which means, in practice, this is the process that's actually interesting. In the case of Finder file operations, I'd expect the Finder to be listed as the responsible process for the service and the user ID of the Finder would be the user to initiated the copy.

The one warning about this technique is this note in the documentation:

"The responsible process may be this process itself, if there’s no responsible process or the responsible process already exited."

That dynamic means that if you're trying to reliable track responsibility, you probably need to do that work "early" (probably at process creation), not "late" (when the parent may have been destroyed). More broadly, this is the kind of work that should be done once as part of the process your client uses to track activity over time, NOT "later" as part of individual event processing.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Well ... looks like it is not a trivial task. Thank you a lot for your response and tips!

What is the audit user id? Can I say that 'auid' is equal to real user id of application which send a message to XPC service?

Accepted Answer

Well ... looks like it is not a trivial task. Thank you a lot for your response and tips!

Yes, and that's a good excuse to pass along my general disclaimer/warning about ES clients. You're working with a VERY dangerous and tricky API, with nearly unlimited potential to damage/disrupt the system. There are two different reasons for this:

  • (Obvious) The ability "veto" a broad set of critical system calls is obviously both powerful and dangerous.

  • (Nonobvious) That veto power means that every ES client is part of the execution path for the syscalls they authorize.

That second point is what makes ES clients so dangerous. The consequence of slowing down critical high volume, critical syscalls range from "annoying" (Why is my system slow?) to "catastrophic" (Why does my system keep panic'ing?) to "completely baffling" (Why does my system work fine except it not take 5 minutes to open an attachment?).

The CLASSIC pattern here is the following:

  1. The ES client is built in a "straightforward" way, typically ONLY looking at individual events without really considering broader system performance or context.

  2. The ES client seems to work fine in "basic" testing and is eventually shipped.

  3. It's eventually discovered that the ES client is generating <arbitrary failure> in <random situation>

  4. That failure is addressed as a specific "bug".

  5. Steps 3 & 4 continue indefinitely until someone realizes there is a much deeper failure going on and address the underlying design issues.

Here are the key things I'd focus on here:

  • Take some time and experiment to see what "bad" actually looks like. Make an ES client that panics the kernel. Make another one that lets the system function but slows the system enough to make it a "miserable" experience.

  • Your ES client has VERY little time to process any given event, FAR less than our standard deadline behavior would imply. See this post for more on that point.

  • Use the API to get rid of everything you don't need. Mute processes you don't need. Use the "cache" flag to eliminate duplicate auth requests.

  • Keep in mind that apps do lots of "weird" things. For example, it's VERY common that very "basic" operations like an app opening a document might involve opening the same file over again.

  • Be your own worst enemy, particularly when it comes to testing your product. Build testing scenerios that intentionally push your client to "destruction". Many clients have problems running alongside other ES clients, so I would both test with other product in common use AND build my own "pathologically bad" client that I could test "against".

Moving to your specific questions:

What is the audit user id?

The core unix semantic is that every process has two user IDs:

real user id (ruid) -> The user ID that created a given process.

effective user id (euid) -> The user ID the system should use when evaluating "decisions".

The complication here is what happens in the following sequence:

  • Process A is launched as root (ruid: 0, euid: 0)

  • Process A uses seteuid to changes it's effective ID (ruid: 0, euid: 501)

  • Process A forks, creating Process B.

At this point, process B's configuration will be (ruid: 501, euid: 501), since the ruid of the new process is determined by the effective user ID of the creating process, losing the connection "back" to the original user. The auid is the solution to this.

audit user id (auid) -> The user ID of the user that actual "started" the work. Informally, that translates to the logged in user "the work".

Can I say that 'auid' is equal to real user id of application which send a message to XPC service?

No, not necessarily. It probably is "most" of the time, but the common case here is actually that they ALL match (auid == ruid == euid), since most process don't mess with any of this.

I'm honestly not sure what "patterns" you'd actually see on "live" machines since, to be honest, I've never actually looked that close. Off the top my head, my guess would be:

  1. They all match (most common)

  2. auid: 0, ruid/euid: <something else> -> I think this is what you'll see in LaunchDaemons that were launched as other users, though it's possible launchd "forces" this to #1.

  3. auid: <users ID> ruid/euid: 0/<something else> -> This is the user using su/sudo to switch users.

Side Note: This is something you'd want to test and confirm experimentally, but my intuition is that having different ruid/euids is relatively rare within the system. There are a very small number of command line tools that use it (su/sudo being obvious examples) but the session architecture of the system mean that simply switching user IDs isn't as "broadly" useful as it would be on other UNIX system. For more information about execution contexts and sessions, see "TN2083 Daemons and Agents".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thank you a lot for this detailed answer!

You're very welcome!

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

How to identify a user who performed action which is reported by Endpoint Security
 
 
Q