Daemon hanging during login

We noticed that our daemon hangs while a user login occurs. This is awkward since we're subscribed as an EndpointSecurity client and feel this scenario betrays our cause quite a bit. I did one spindump that shows virtually zero activity over at least the default 10 seconds in our daemon.
  1. Is this expected or known behavior? Can this be mitigated or ideally avoided, maybe by some launchd configuration option we're unaware of?

  2. Our daemon also opens up a Unix domain socket connection to a user agent for every user session. These socket connections regularly exceed a 10 seconds timeout in select when a user logs in or out. I'm kinda okay with this if it happens for the same user session during logout but it also seems to occur for other session logins. Is this just a thing we have to live with when using sockets? Does XPC behave better in this regard?

Thanks for any insights.

Replies

I don't seem to be able to edit the original post anymore, so here is an update regarding point 2:

but it also seems to occur for other session logins.

This turns out to not be verifiable in our case. (I was false-tracked by last output. It looks a lot like last does not honor the time of GUI session logout but just counts along since there remains another, silent "background" session for the given user. That session is still a mystery of its own to me, to be honest.)

At this time, I can not tell for sure if login or logout or both cause the issue and how sessions interact with each other. I am sure, however, that any such activity considerably slows our daemon process down.

Is this expected or known behavior?

No, something is seriously wrong here. ES clients exist in a very privileged position within the system and so you have to be super careful not to accidentally trigger a deadlock by using general system services.

I don't seem to be able to edit the original post anymore

Right. DevForums posts are only editable a short time after you create them. See this and other tidbits, see Developer > Support > Developer Forums.

Our daemon also opens up a Unix domain socket connection to a user
agent for every user session.

Your what now? Reading the above it seems like the agent is listening on the UNIX domain socket and the daemon is connecting to it. Is that right?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"
Thanks, Quinn.

trigger a deadlock by using general system services

I assume you mean in response to an ES AUTH event. We don't block on those, currently.

Your what now? Reading the above it seems like the agent is listening on the UNIX domain socket and the daemon is connecting to it. Is that right?

Yeah I might have phrased that awkwardly... You understood correctly.
Regarding 1., after one week of tracing, debugging and experimenting I now understand that the scheduler deems our daemon not worthy enough to run at our usual speed during GUI session logins due to our ProcessType and our dispatch queues not being Interactive and QOS_CLASS_USER_INTERACTIVE, respectively. In hindsight this makes sense, although the terminology is somewhat misleading in the context of a daemon.

Regarding 2., how big of an issue this really is now remains to be seen. Quinn, if you're reading this, you commented that

XPC is a lot faster than UNIX domains sockets

I am generally interested in how and where this increase in speed manifests (peer lookup, connection, transmission, fault notification). Also, do you know if it would, or at least believe that it could, apply to our situation, i.e. XPC being helpful?

I am generally interested in how and where this increase in speed
manifests

I don’t have any benchmarks to point you at but my expectation is that XPC will be faster in all aspects. UNIX domain sockets are built on the BSD Socket infrastructure, which was designed for (1980s era) network speeds. XPC is built on top of Mach messaging, which was designed from day one to be a local IPC mechanism [1].

There is, however, one place where XPC offers a guaranteed win, namely that it supports QoS propagation. If a client is running at a high QoS and it invokes an XPC service, that’ll propagate the client’s QoS to the service to avoid priority inversions.

The best explanation of this is WWDC 2014 Session 716 “Power, Performance, and Diagnostics: What’s New in GCD and XPC”. Unfortunately that video is no longer available from our site. However, if you search the ’net for the session title you should be able to find the slides.

Oh, while I’m here let me put a plug in for the following:
The second one is especially important.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] Whether it achieves that design goal is debatable )-: Mach messaging has a lot of cruft that makes it much slower than it should be.

designed from day one to be a local IPC mechanism

Thanks, that makes sense.

I have yet to watch those sessions but while we're putting plugs, let me add one more that helped me better understand GCD:

Making efficient use of the libdispatch (GCD)

I have yet to watch those sessions but while we're putting plugs, let
me add one more that helped me better understand GCD:

Making efficient use of the libdispatch (GCD).

Yeah, that doc has lots of good advice. (Indeed, I’m only responding here so that this post, with your link, ends up in my EagleFiler database :-)

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"