I've submitted a feedback issue about this, with sample code. It's extremely easy to reproduce with a minimal VPN. Right now it's blocking release of anything using includeAllNetworks for us, because loss of DNS resolution is completely unacceptable.
Hopefully there's some simple way to resolve the problem.
FB13331886
Post
Replies
Boosts
Views
Activity
One odd thing is that it's only things that rely on DNS that appear to be breaking. If I try to ssh/ping/etc. to a system by IP address it works fine.
It looks like there's a supportsDefaultDrop flag set on the config when includeAllNetworks is set,
2023-10-27 13:13:22.077480-0700 0x50591 Debug 0xb588c 320 0 nesessionmanager: [com.apple.networkextension:] applyIPDefaultDrop: session TestConfig <supportsDefaultDrop 1> <disableDefaultDropAfterBoot 0>
But it's not clear why this would only have an effect when we disconnect from the extension instead of calling stopTunnel from the management app. Or why it would only affect name resolution.
The cleanup sequence from nesessionmanager looks quite different for the case where we're stopping the tunnel from the extension:
Disconnect started by framework
Note that we don't see an "entering disposing state" here. When the disconnect is triggered from the management app we see:
11:18:28.084338-0700 nesessionmanager com.example.vpn.app[1854]: disposing
11:18:28.084763-0700 nesessionmanager com.example.vpn.app[1854]: disposed, tearing down agent connection
11:18:38.722862-0700 nesessionmanager NESMVPNSession[Primary Tunnel:TestConfig:67D0C9ED-FFBC-4FF7-B0CF-9195809EC6FB:(null)] in state NESMVPNSessionStateDisposing: plugin NEVPNTunnelPlugin(com.example.vpn.app[inactive]) dispose complete
11:18:38.722928-0700 nesessionmanager NESMVPNSession[Primary Tunnel:TestConfig:67D0C9ED-FFBC-4FF7-B0CF-9195809EC6FB:(null)] in state NESMVPNSessionStateDisposing: all plugins have disposed
11:18:38.723301-0700 nesessionmanager NESMVPNSession[Primary Tunnel:TestConfig:67D0C9ED-FFBC-4FF7-B0CF-9195809EC6FB:(null)]: Leaving state NESMVPNSessionStateDisposing
The network is fine at this point. If we delete the VPN config we see:
11:19:02.495277-0700 configd SCNC: stop, triggered by (114) configd, type com.example.vpn.app, reason Service Disposed
Disconnect started by cancelTunnelWithError
When the VPN shutdown is triggered from the extension we see multiple messages about disposing & teardown, but nothing about leaving the disposing state:
11:14:25.887783-0700 nesessionmanager com.example.vpn.app[1854]: disposing
11:14:25.888103-0700 nesessionmanager com.example.vpn.app[1854]: disposed, tearing down agent connection
11:14:42.306910-0700 nesessionmanager com.example.vpn.app[1854]: disposing
11:14:42.307280-0700 nesessionmanager com.example.vpn.app[1854]: disposed, tearing down agent connection
11:15:47.831509-0700 nesessionmanager NESMVPNSession[Primary Tunnel:TestConfig:3CA66354-3DDF-40FF-8C59-473ED2545DFB:(null)] in state NESMVPNSessionStateStopping: plugin already disconnected, disposing all plugins
11:15:47.831604-0700 nesessionmanager NESMVPNSession[Primary Tunnel:TestConfig:3CA66354-3DDF-40FF-8C59-473ED2545DFB:(null)]: Entering state NESMVPNSessionStateDisposing, timeout 5 seconds
11:15:47.831643-0700 nesessionmanager com.example.vpn.app[1854]: disposing
11:15:47.833142-0700 nesessionmanager com.example.vpn.app[1854]: disposed, tearing down agent connection
at this point the network is dead to DNS. When we delete the VPN configuration entirely we get:
11:16:12.804142-0700 configd SCNC: stop, triggered by (114) configd, type com.example.vpn.app, reason Service Disposed
11:16:12.891252-0700 nesessionmanager NESMVPNSession[Primary Tunnel:TestConfig:3CA66354-3DDF-40FF-8C59-473ED2545DFB:(null)] in state NESMVPNSessionStateStopping: plugin already disconnected, disposing all plugins
11:16:12.891377-0700 nesessionmanager NESMVPNSession[Primary Tunnel:TestConfig:3CA66354-3DDF-40FF-8C59-473ED2545DFB:(null)]: Entering state NESMVPNSessionStateDisposing, timeout 5 seconds
11:16:12.891438-0700 nesessionmanager NESMVPNSession[Primary Tunnel:TestConfig:3CA66354-3DDF-40FF-8C59-473ED2545DFB:(null)] in state NESMVPNSessionStateDisposing: no plugins to dispose
11:16:12.891803-0700 nesessionmanager NESMVPNSession[Primary Tunnel:TestConfig:3CA66354-3DDF-40FF-8C59-473ED2545DFB:(null)]: Leaving state NESMVPNSessionStateDisposing
And name resolution immediately starts working.
Completely reproducible.
I've narrowed down the problem a little bit. It appears that the issue occurs when the tunnel shutdown is initiated from the extension rather than by the system, as in cases of timeouts or server disconnects.
Starting the extension is pretty standard. We create the connection to the server, set up the network settings, and end up calling the completion handler passed in to startTunnelWithOptions.
Stopping the extension is largely common code for system/user (stopTunnelWithReason:completionHandler) initiated shutdown or extension-initiated shutdown.
For shutdowns initiated by stopTunnelWithReason:completionHandler our final call is to the completion handler passed in, while for shutdowns initiated in the extension we end up calling cancelTunnelWithError.
Can we just add the SystemExtension entitlement to the current App ID for the network extension, update the profile, and continue with the same ID, or will we need to define a new ID?
This question doesn’t make sense. An App ID has capabilities which flow into the entitlement allowlist in your provisioning profile. When you create a profile from an App ID with the Network Extension capability enabled, you get different results depending on the type of provisioning profile:
Most profiles end up with the standard NE values in the allow list.
Developer ID profiles end up with values that have the -systemextension suffix.
The App ID configuration has an entry which is "System Extension". I shouldn't have used the term "Entitlement" since on the web site it talks about "Capabilities"...
In the case of an App Store distribution, which we'd like to keep doing, it wouldn't be a Developer ID profile, so this is really asking about that case and whether we can just add the System Extension capability to the ID & update the App Store distribution profile. I understand that we'd need a new profile for the out-of-app-store distribution.
The file and keychain answers are pretty much what we expected--we expect that we're likely to need to do some XPC to talk between the two modules.
If the defaultRoute is set in IPv4Settings.includedRoutes then exclusions don't work.
In the documentation it says that defaultRoute is "A convenience method for creating the default IPv4 route", which of course is 0.0.0.0/0.
Any addresses set in excludedRoutes continue to be tunneled. If we set 1/1 & 128/1 in IPv4Settings.includedRoutes things work, and excludedAddresses are excluded properly. This is similar enough to the behavior with ATS that I'd suspect similar underlying logic.
Cool. Sounds like I don't need to file another issue about that, unless it would help with finding out when the problem is fixed.
I hope they're looking at fixing this issue throughout the Network Extension code, not just for the ATS exclusions.
Kevin
kjbrock icloud com
Thanks for the response. Hope it's fixed soon, because we're trying to test for day 0 compatibility with the new releases, and if the fix doesn't come out soon that won't work very well.
We have enough tests that manually running through each of them every time is impractical.
Well, 0.0.0.0/0 does not work. Specifying a specific subnet or IP address works.
This is true with a lot of other Network Extension APIs. 0.0.0.0/0 isn't treated as an address, it's treated like a flag. You can't specify exclusions on the default subnet for example.
If you define both 128.0.0.0/1 and 0.0.0.0/1, which effectively covers the same address space, it works as expected.
Someone really dropped the ball on default network handling in the Network Extension code though--if it can be specified as a CIDR address it should be treated as a CIDR address, and it's not.
Thanks. I'll take a look.
What we need to validate is whether adding all IP addresses to the exception domains means that no cert validation will be done.
If so, then that's not quite what we want. Prior to iOS 17 we'd
get a callback with a recoverable trust failure
check to see that it was a hostname issue
see if the host name that we know is in the certificate, and continue if so, cancel if not.
I did know that. It's still a compile time setting, and we don't know what needs to be excepted at compile time.
The only way to make that work would be to specify 0.0.0.0/0, which is pretty close to turning ATS off.
We don't know the IP address in advance either.
The domain name(s) and the IP address(es) are all dynamic--customers configure all of those.
I see no way in the documentation to say "don't apply ATS to connections made by any IP address" in the general case, just ways to configure either
Exceptions for specific domains/IP addresses
Bypass ATS altogether
Are you thinking of NSAllowsLocalNetworking? The description of its behavior in the documentation is really unclear, and sounds like it contradicts the ATS changes and observed behavior in the betas when it speaks about .local domains, unqualified domains, and IP addresses ("In iOS 10 and macOS 10.12 and later, ATS allows all three of these connections by default, so you no longer need an exception for any of them").
I've been amending the feedback issues mentioned earlier in the thread, but I suppose this should be created as a new feedback.
The behavior of write() still seems wrong as well. No matter what I'm trying to write, the file size & contents shouldn't be changing if there's a failure return.
It looks like write is updating the file size and then trying to write data, so when there's a problem at that stage we still see a modified file. Should be checking for data to write first.
My understanding was that the NSExceptionDomains value had to be set in the info.plist file, and thus at build time as mentioned in
https://developer.apple.com/documentation/bundleresources/information_property_list/nsapptransportsecurity
We don't know which domains need to be handled differently at that point. Each customer could have multiple domains configured about which we know nothing.
We could turn ATS off, but that seems extreme (-ly unsafe). What we're looking to do is validate by hostname when we're connecting to an IP address for which we know the hostname. We can't just connect by hostname at certain stages, because it will break in the case of load-balanced hostnames which can resolve to multiple IPs.