This is a follow up to feedback FB9144718, which we also discussed at a WWDC21 "Performance, power, and stability" lab session.
Issue Summary
The actual issue we are facing is that our XPC service is not running as fast as we would expect it to be, especially not on Intel machines; somewhat better on M1 machines but still not really good.
After a lot of profilling with instruments, it finally turned out that the problem is caused by our processing getting regularly stopped as our processing thread is being preempted and put on hold for sometimes a tramendous amount of time (up to over 32 ms have been monitored). Even it is preempted for just a couple of ms most of the time, this is still a lot considering that the actual work it would otherwise perform is only in the range of microseconds.
The reason why this is happening is probably caused by the fact that we don't use the XPC service just for processing application messages through the XPC protocol, which we do as well, but also retrieve requests through a mach port from another process.
This causes our thread priority to be dropped down to 4 (see highlighted log line) and that's the reason why we get preempted for so long. The reason why it's not equally dramatic on M1 is that we are not preempted there, instead we are forced to run on the high efficiency cores instead of the high performance ones.
Ideas from the Lab
Other than completely restructuring our entire implementation which is eventually going to happen in the future anyway for Big Sur and newer, we still have to maintain this structure as long as we need to also support pre-Big Sur macOS version, so it would be great to have a less dramatic fix.
Two suggestions were made at the lab:
-
Change the
RunLoopType
in the XPC Plist fromdispatch_main
toNSRunLoop
. We tried that but that didn't made any difference. -
Add a key
ProcessType
with the valueInteractive
to the XPC Plist. This key is not documented for XPC services, only for launchd daemons but we were told it should actually work for XPC services as well. We tried that as well, both, top level as well as adding it to theXPC
sub-key but that didn't make a difference either.
Another Idea That Didn't Work
Now that second suggestion made me look up that key in the man page for launchd.plist
and what I found there was pretty interesting. Apparently there is a ProcessType
value documented as
Adaptive
Adaptive jobs move between the Background and Interactive classifications based on activity over XPC connections. See xpc_transaction_begin(3) for details.
This seems to be our problem. Our XPC service is considered inactive when it processes messages over the mach port. Looking up the documentation of xpc_transaction_begin(3)
tells me:
Services may extend the default behavior using
xpc_transaction_begin()
andxpc_transaction_end()
, which increment and decrement the transaction count respectively. This may be necessary for services that send periodic messages to their clients, not in direct reply to a received message.
Using these two messages also frees us from the requirement to enable/disable sudden termination our own as it will automatically be controlled by these two functions as well. Yet even using these two functions to indicate activity doesn't prevent us from being preempted at regular intervals as our priority still drops to priority level 4 while we are still in the middle of processing (haven't called xpc_transaction_end()
yet) . We seem to use it correctly though as it correctly disables sudden termination on our behalf as long as our XPC service remain in the active state (it will only receive mach messages for processing while in that state) and also gets re-enabled when we leave the active state again.
Final Thoughts
Also on the man page of xpc_transaction_begin()
is written:
The XPC runtime will also automatically manage the service's priority based on where a message came from. If an app sends a message to the service, the act of sending that message will boost the destination service's priority and resource limits so that it can more quickly fill the request. If, however, a service gets a message from a background process, the service stays at a lower priority so as not to interfere with work initiated as a direct result of user interaction.
It looks like this is not working the way we use the XPC service at the moment. Our mach port messages either come from a System Extension (Big Sur and up) or from a root daemon started by launchd (Catalina and below, ProcessType
is Interactive
and nice
value is -10
) but apparently these messages cannot boost our XPC service and so it will stay on low prio.