How to prevent that our XPC service is getting contantly preempted?

This is a follow up to feedback FB9144718, which we also discussed at a WWDC21 "Performance, power, and stability" lab session.

Issue Summary

The actual issue we are facing is that our XPC service is not running as fast as we would expect it to be, especially not on Intel machines; somewhat better on M1 machines but still not really good.

After a lot of profilling with instruments, it finally turned out that the problem is caused by our processing getting regularly stopped as our processing thread is being preempted and put on hold for sometimes a tramendous amount of time (up to over 32 ms have been monitored). Even it is preempted for just a couple of ms most of the time, this is still a lot considering that the actual work it would otherwise perform is only in the range of microseconds.

The reason why this is happening is probably caused by the fact that we don't use the XPC service just for processing application messages through the XPC protocol, which we do as well, but also retrieve requests through a mach port from another process.

This causes our thread priority to be dropped down to 4 (see highlighted log line) and that's the reason why we get preempted for so long. The reason why it's not equally dramatic on M1 is that we are not preempted there, instead we are forced to run on the high efficiency cores instead of the high performance ones.

Ideas from the Lab

Other than completely restructuring our entire implementation which is eventually going to happen in the future anyway for Big Sur and newer, we still have to maintain this structure as long as we need to also support pre-Big Sur macOS version, so it would be great to have a less dramatic fix.

Two suggestions were made at the lab:

  1. Change the RunLoopType in the XPC Plist from dispatch_main to NSRunLoop. We tried that but that didn't made any difference.

  2. Add a key ProcessType with the value Interactive to the XPC Plist. This key is not documented for XPC services, only for launchd daemons but we were told it should actually work for XPC services as well. We tried that as well, both, top level as well as adding it to the XPC sub-key but that didn't make a difference either.

Another Idea That Didn't Work

Now that second suggestion made me look up that key in the man page for launchd.plist and what I found there was pretty interesting. Apparently there is a ProcessType value documented as

Adaptive

Adaptive jobs move between the Background and Interactive classifications based on activity over XPC connections. See xpc_transaction_begin(3) for details.

This seems to be our problem. Our XPC service is considered inactive when it processes messages over the mach port. Looking up the documentation of xpc_transaction_begin(3) tells me:

Services may extend the default behavior using xpc_transaction_begin() and xpc_transaction_end(), which increment and decrement the transaction count respectively. This may be necessary for services that send periodic messages to their clients, not in direct reply to a received message.

Using these two messages also frees us from the requirement to enable/disable sudden termination our own as it will automatically be controlled by these two functions as well. Yet even using these two functions to indicate activity doesn't prevent us from being preempted at regular intervals as our priority still drops to priority level 4 while we are still in the middle of processing (haven't called xpc_transaction_end() yet) . We seem to use it correctly though as it correctly disables sudden termination on our behalf as long as our XPC service remain in the active state (it will only receive mach messages for processing while in that state) and also gets re-enabled when we leave the active state again.

Final Thoughts

Also on the man page of xpc_transaction_begin() is written:

The XPC runtime will also automatically manage the service's priority based on where a message came from. If an app sends a message to the service, the act of sending that message will boost the destination service's priority and resource limits so that it can more quickly fill the request. If, however, a service gets a message from a background process, the service stays at a lower priority so as not to interfere with work initiated as a direct result of user interaction.

It looks like this is not working the way we use the XPC service at the moment. Our mach port messages either come from a System Extension (Big Sur and up) or from a root daemon started by launchd (Catalina and below, ProcessType is Interactive and nice value is -10) but apparently these messages cannot boost our XPC service and so it will stay on low prio.

Did you ever figure this out? Seeing one my many XPC processes getting backgrounded on Monterey. Can't figure out why yet.

I had this same problem where our XPC service was stalled for 33ms at a time during intensive work. I found a workaround that I shared on StackOverflow: https://stackoverflow.com/a/71407357/7445

The only solution we found was the same one that also jdv85 shared on StackOverflow. When the app is calling a method of the XPC service for processing, the priority of that XPC service is raised so that the request can quickly be performed as the main app may have to wait for the result of that call. Once processing is over, which is indicated by calling the callback block of the request, the priority level will drop again after a while. By never "answering" the request, the system believes that the XPC service is still processing it and that way the priority stays up. This also keeps sudden termination disabled automatically for the XPC service as the system will not terminate an XPC process while it believes it is still actively processing data, so we don't need to disable it manually anymore.

While this solution works, it's still just a workaround that may break one day in the future, e.g. when Apple decides that there is a time limit for XPC requests and in case an XPC request receives no answer within a given time frame, the priority of the XPC process is limited nonetheless as apparently that task is very long running. It's not a permanent fix you can surly rely upon forever.

While this solution works, it's still just a workaround

I would not characterise this as a workaround. The fact that an open XPC transaction boosts priority and disables termination of your XPC Service is expected behaviour.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

How to prevent that our XPC service is getting contantly preempted?
 
 
Q