Rare issue with Network Extension and VPN profile on iOS platform

Hi!

We are developing VPN software for the iOS platform, and our customers report a rare issue that we cannot reproduce. We seek any advice about the root cause of such a problem.

On every update, we notice an increased number of customer reports saying that the tunnel process is in a "connecting" loop, and to break the loop the customer has to remove the VPN profile from the settings. As none of our testers could reproduce the issue, we have minimal knowledge to work on. What we know so far:

  • The OnDemand rules cause the tunnel process to be restarted in the loop
  • The tunnel process does not start at all. We have logs from our customers, and we know that the application tries to start an extension, but the extension does not start at all. Something in the operating system prevents the extension from starting.
  • The issue reappears on every app update.

My theory so far is that the profile gets broken during an update process, but we have no means of confirming that.

Is this a known issue? Any advice on how could we reproduce the problem? Thank you in advance for any tips!

I’m not aware of a specific problem here, but I’ve seen various issues like this in the past. Before I send you off to file a bug, there’s one thing I’d like to confirm. You wrote:

The OnDemand rules cause the tunnel process to be restarted in the loop

and:

but the extension does not start at all. Something in the operating system prevents the extension from starting.

which I’d like to clarify. This seems to suggest that the system is starting your extension in the on-demand case but not when you start the VPN manually. Is that right?

In Debugging a Network Extension Provider I explain the concept of a ‘first light’ log point. I suggest that you add this to your provider and then have your custom reproduce the problem and capture a sysdiagnose log immediately after seeing the problem. That’ll confirm whether your provider has started at all.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

This seems to suggest that the system is starting your extension in the on-demand case but not when you start the VPN manually. Is that right?

Ah, that was a mental shortcut from my side. Our application always sets the OnDemand rules upon starting the tunnel, and we deactivate it only when the user explicitly disconnects. After setting onDemand rules to true, our application starts the tunnel process.

As far as we know (as I said, we have very little data to work on, only a couple of confirmed cases), the OnDemand rules do not cause the issue per se; they are only responsible for the strange state of "flickering" our users experience - the system is not able to start the tunnel process for some reason. Because of onDemand rules, it tries again, and again, and again...

We are currently developing telemetrics to learn how widespread this issue is for our users, but we have run out of ideas on how to reproduce it, and we don't want to ask our users for full sysdiagnose due to privacy concerns.

I would greatly appreciate any theory you might have so we can experiment with it and maybe finally reproduce it. Also, without clear steps of reproduction we cannot actually file a bug report and hope it will be fixed.

I would greatly appreciate any theory you might have

I don’t have any great theories. Problems like this are more prevalent on macOS, where there’s a lot more flexibility in how apps, and hence app extensions, are installed. On iOS and its child platforms, things are much more tightly controlled, and hence less susceptible to weird issues.

I don’t see any good way of investigating this without a sysdiagnose log taken shortly after reproducing the problem. Ideally that’d be on a device that has the VPN (Network Extension) debugging profile installed. For more on this, see our Bug Reporting > Profiles and Logs page.

However, getting this will be tricky given that you can’t reproduce the problem. I have some general advice on that topic in Using a Sysdiagnose Log to Debug a Hard-to-Reproduce Problem, but I’m not sure whether it’ll actually be useful in your specific situation.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Thank you for the link to the Profiles and Logs page; I didn't know that one!

Yeah, on macOS, we are kind of able to reproduce the issue (at least something that looks like the issue we are chasing) by first installing our app from XCode and then installing it from the AppStore (without removing the old profile). We see then in the system logs that the certificates of the profiles do not match and the system does not start the Network Extension. I'm just having a hard time figuring out how this can happen for our regular users or on iOS...

We have also started experiencing this issue recently. Similarly, users who have updated the app find that the VPN automatically activates on demand once the app update is complete. However, during this process, all networking stops. Trying to turn the VPN on or off by launching the app does not trigger any action from the packettunnelprovider, as if it were deadlocked. Even deleting the VPN profile or reinstalling the app does not resolve the issue. The only solution is to restart the device, which fixes the problem.

Additionally, it is peculiar that all the users who reported this issue are on iOS version 17.3.1.

We see then in the system logs that the certificates of the profiles do not match and the system does not start the Network Extension.

Yeah, I’ve seen reports of that before.

I'm just having a hard time figuring out how this can happen for our regular users or on iOS...

Likewise. iOS uses very different mechanisms to identify code than macOS.

Any chance that TestFlight is in this mix?

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Any chance that TestFlight is in this mix?

We tried to reproduce it very hard by performing updates with TestFlight but have had no luck so far. Our only reports are from regular users who are not using TestFlight.

Amazingly, I just had a similar occurrence on my Mac this morning. We use a third-party app in a work-from-home setup, and just this morning, it went crazy and was flickering "connecting->disconnected" very quickly. I see that the state was changed every 100 milliseconds in the system logs. The NetworkExtension complains:

2024-02-21 10:36:31.676860+0100 0x1a427    Default     0x0                  518    0    nesessionmanager: [com.apple.networkextension:] Found 0 (0 active) registrations for <tunnel provider> (com.apple.networkextension.packet-tunnel)
2024-02-21 10:36:31.677349+0100 0x1a427    Default     0x0                  518    0    nesessionmanager: [com.apple.networkextension:] NESMVPNSession[Primary Tunnel:<provider>:496B2DE4-FD7D-482F-9607-4B3021283266:(null)]: Leaving state NESMVPNSessionStatePreparingNetwork
2024-02-21 10:36:31.677522+0100 0x1a427    Default     0x0                  518    0    nesessionmanager: [com.apple.networkextension:] NESMVPNSession[Primary Tunnel:<provider>:496B2DE4-FD7D-482F-9607-4B3021283266:(null)]: Entering state NESMVPNSessionStateStarting, timeout 300 seconds
2024-02-21 10:36:31.678820+0100 0x1a429    Default     0x0                  518    0    nesessionmanager: [com.apple.networkextension:] NESMVPNSession[Primary Tunnel:<provider>:496B2DE4-FD7D-482F-9607-4B3021283266:(null)] in state NESMVPNSessionStateStarting: plugin NEVPNTunnelPlugin(<plugin>[inactive]) started with PID 0 error Error Domain=NEAgentErrorDomain Code=2 "(null)"
2024-02-21 10:36:31.678897+0100 0x19c0e    Default     0x0                  15325  0    VPN: (NetworkExtension) [com.apple.networkextension:] Last disconnect error for <3rd party app> changed from "none" to "The VPN app used by the VPN configuration is not installed"

I obfuscated the name of the app, since I'm not sure I should share that publicly.

Does the error The VPN app used by the VPN configuration is not installed ring any bell? I tried to Google it quickly, but no luck so far.

I'm not sure if it's related to the issues we observe, but from the user perspective, the issue looks exactly the same.

Does the error The VPN app used by the VPN configuration is not installed ring any bell?

These sorts of problems are more common on macOS, because NE relies on PluginKit [1] which in turn relies on Launch Services. If the Launch Services database is borked — which is common on development machines, but also happens on normal user machines — then you can easily run into problems like this.

However, none of that applies on iOS where there’s only one way to install apps [2].

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] This is an internal subsystem to manage extensions. It’s notably because there’s a pluginkit command-line tool.

[2] Well, two, if you include Xcode, but that’s not relevant on user machines.

I have the same issue with Orbot iOS suddenly. At least, when I install it via Xcode: My iPhone 15 Pro runs iOS 17.3.1.

I tried to debug the hell out of it, following your logging advice. Couldn't find any traces of the VPN ever starting. No "first light" log. What I saw was a lot of messages about that the "TorVPN" process (the Network Extension) doesn't have a PID.

I restarted the device, and, wouldn't you know? Everything worked just fine, suddenly. There is definitely a bug with iOS 17.3.1.

Rare issue with Network Extension and VPN profile on iOS platform
 
 
Q