XPC doesn't work with network extension on app upgrade

Our app has a network extension (as I've mentioned lots 😄). We do an upgrade by downloading the new package, stopping & removing all of our components except for the network extension, and then installing the new package, which then loads a LaunchAgent causing the containing app to run. (The only difference between a new install and upgrade is the old extension is left running, but not having anything to tell it what to do, just logs and continues.)

On some (but not all) upgrades... nothing ends up able to communicate via XPC with the Network Extension. My simplest cli program to talk to it gets

Could not create proxy: Error Domain=NSCocoaErrorDomain Code=4099 "The connection to service named blah was invalidated: failed at lookup with error 3 - No such process." UserInfo={NSDebugDescription=The connection to service named bla was invalidated: failed at lookup with error 3 - No such process.}
Could not communicate with blah

Restarting the extension by doing a kill -9 doesn't fix it; neither does restarting the control daemon. The only solution we've come across so far is rebooting.

I filed FB11086599 about this, but has anyone thoughts about this?

installing the new package, which then loads a LaunchAgent causing the containing app to run. 

If you just install a new version of the container app and run the OSSystemExtensionRequest to replace the existing Network System Extension, does it work?

It's hard to tell, since it works properly at least 80% of the time... Next times it happens, I'll try getting it to think it's a fresh install.

It happened to someone again, and I had them unload all of the daemons and the agent, and clean up /Applications and the application support directory, and then re-install.

Same behaviour. It did install the right version, according to systemextensionctl list.

An update: the user did a manual install (opening the pkg file) while it was in this state, and things worked properly after this. I'm going to see if that is reproducible.

We've now gotten to the point where we can get it to reproduce -- not on demand, but within an hour or two. This involves opening up a thousand web pages (yay osascript), closing the windows, and then installing another version. Upgrade or downgrade doesn't seem to matter. The extension gets loaded, and starts, but nothing can communicate with it via XPC. Doing the install again fixes it. I am going to add explicit "install extension" and "uninstall extension" menu actions to the containing app, to see if that also does it (I suspect it will). I still have absolutely no idea what is going on here.

I'm also pretty annoyed that the invalidation handler doesn't get called for a surprisingly long period.

I can fix the issue by unloading the extension (which requires user interaction), and then reloading the extension (which requires more user interaction).

This is not really acceptable, of course. I have no idea what is causing it :(.

Hello, I have also encountered such a problem. My logic is to ensure that only the system extension of the newer version can be installed before the system extension of the old version can be run. However, after I upgrade the system extension, The XPC server for system extension is not started. Have you solved it yet? How was it solved? Please let me know if you know. Thank you.

Hello, I have encountered the same problem you have described. I can fairly easily reproduce it on a MacBook running M1 Pro chip, but we are having difficulties reproducing it on intel Macs. I am therefore thinking it might have something to do with additional security checks Macs on arm might do.

System restart usually fixes this issue. Killing the extension and/or client which tries to connect to it does not help.

As I said: I added code to the containing app to periodically try to communicate with the extension, and if it can't, it then unloads and reloads it. This does fix it -- at the cost of having multiple GUI prompts.

The same problem here. It only happens on ARM, haven't seen it on Intel. When replacing Network System Extension during installation in postinstall script we are hitting this issue at least 50% of the time. Isn't is possible that OS cannot correctly validate signatures of the extension or client app and that makes XPC listener kick off the client?

We tried to do the following Disable network filter (not removing, just changing its state to avoid OS dialog that prompts user to allow network filter) calling launchctl stop NetworkExtension.com.company.feature.version...

This stops the extenison while making registration and upgrade, we were hoping that later when enabling the network filter and extension would start with proper XPC listener that would accept connection from the client. It didn't help.

Our Network Extension is not sandboxed yet. Could this help? But it works after reboot, so its only an issue right after upgrading extension.

My best guess is that launchd is confused about something, and can't map the name to the right port. Unloading the extension and reloading it causes launchd to reset the port, and thus seems to fix it. So simply stopping doesn't do the trick.

I've had no responses on my FB. Perhaps filing new ones and referencing mine might help?

I tried killing sysextd, nesessionmanager and other processes to reset XPC. Killing launchd was my other considerartion, but that's rather severe hack to overcome some bug in XPC initialization.

Perhaps this could help to fix XPC invalidation- add following to postinstall script:

disable network filter
stop network extension
spctl -a -vv "${APP_PATH}"
spctl -a -vv "/Applications/${PRODUCT_NAME}.app/Contents/Helpers/${MANAGER}.app"
spctl -a -vv "/Applications/${PRODUCT_NAME}.app/Contents/Helpers/${MANAGER}.app/Contents/Library/SystemExtensions/com.company.feature.dev.systemextension"
upgrade system extension
enable network filter
connect XPC client

But it is possible that simply using sleep 30 before upgrading system extension would accomplish the same :-)

It would appear that the problem is solved if I add only following steps into preinstall script

  1. disable network filter
  2. stop network system extension

After that installd replaces the bundle and system extensions in it, postinstall then runs our helper, which registers system extenison and starts the network filter (starts TransparentProxy/AppProxy).

I guess stopping system extensions while the old bundle still exists, then replacing the bundle and upgrading system extension (basically it just registers new verison of the extension as the process is not running anymore) causes OS (launchd) to clear up XPC related caches.

@Robert_Developer

What are the commands to perform these in preinstall script. As of now, these are the operations we perform in preinstall:

  1. Unload the application from launchctl (launchctl asuser $GSUID /bin/launchctl unload $AGENT_PLIST_PATH)
  2. rm -rf /Library/LaunchAgents/com.myapp.plist
  3. We kill both UI and Extension process in preinstall using pkill
XPC doesn't work with network extension on app upgrade
 
 
Q