`NEVPNProtocol.includeAllNetworks` and `NEPacketTunnelProvider.createTCPConnectionThroughTunnel`

The .includeAllNetworks flag on the NEVPNProtocol object seems suitable for use as a vpn "kill switch." At the very least, the documentation specifies that "if this value is true and the tunnel is unavailable, the system drops all network traffic." Our application has a UI element that allows the user to toggle this setting, for the purposes of ensuring that all of their traffic is sent through the VPN connection.

We're encountering an issue, however: it appears that, with this setting enabled, any NWTCPConnection returned by NEPacketTunnelProvider.createTCPConnectionThroughTunnel will never connect. It stays in the .connecting state and never advances to the .connected state. The documentation for this method states that this method can be used "to create a TCP connection to an endpoint inside the private network."

Does this mean that the remote endpoint being connected to by createTCPConnectionThroughTunnel must reside inside the private network being connected to by the tunnel in order for it to work properly with the .includeAllNetworks setting? Or is the documentation simply suggesting that the TCP connection is tunneled through the private network?

Other web pages seem to be loading just fine while this tunnel is active, it is just the connections returned by this function that seem to be timing out with .includeAllNetworks set to true. If I set it to false, the NWTCPConnection objects returned by this function transition to the .connected state just fine and data can be passed through them with no problems. Is this expected behavior, or is this a possible manifestation of something misconfigured in the VPN profile?

Edit: I tested a bit more and it looks like even local connections over the private network seem to time out; I set up an endpoint within the VPN at 10.1.0.1 and createTCPConnectionThroughTunnel was still unable to connect with .includeAllNetworks set to true.

Another update: it looks like if I use URLSession instead of createTCPConnectionThroughTunnel to create a TCP stream, the connection works just fine with .includeAllNetworks set to true, but the connection doesn't go through the tunnel. It would be rather ironic if users switched this setting on expecting a more secure connection, only to have the app be forced to use a part of the API that doesn't tunnel a portion of their networking...

Does this mean that the remote endpoint being connected to by createTCPConnectionThroughTunnel must reside inside the private network being connected to by the tunnel in order for it to work properly with the .includeAllNetworks setting?

When using createTCPConnectionThroughTunnel this API will create a new TCP connection bound to the tunnels interface. So if the address NWEndpoint has a remote address that corresponds to the interface, then all "should" be good.

The includeAllNetworks flag causing an issue here is an interesting wrinkle. Do you have any other providers installed and on the device / machine that you are working with? Also, what do you have the tunnelRemoteAddress set to in NEPacketTunnelNetworkSettings? Is it the destination IP of your VPN server?

Lastly, are there any logs that show up in the Console.app of where the TCP connection is getting stuck?

I inserted a breakpoint where the app calls NETunnelProviderManager.loadAllFromPreferences, and it called the completion with only one value, so there are no other active providers on the device. When I looked for a place where we set tunnelRemoteAddress, I found this bit of code in WireGuardKit:

    func generateNetworkSettings() -> NEPacketTunnelNetworkSettings {
        /* iOS requires a tunnel endpoint, whereas in WireGuard it's valid for
         * a tunnel to have no endpoint, or for there to be many endpoints, in
         * which case, displaying a single one in settings doesn't really
         * make sense. So, we fill it in with this placeholder, which is not
         * a valid IP address that will actually route over the Internet.
         */
        let networkSettings = NEPacketTunnelNetworkSettings(tunnelRemoteAddress: "127.0.0.1")

For one, WireGuardKit is an external dependency of the application, so going through the necessary steps to change it would be possible, but more difficult than just changing a value. When I overrode this piece of code on my machine to set it to the remote endpoint's address, however, I noticed that the connections were still timing out and the issue was not fixed.

These are the only logs in Console that I see in the extension related to the network connection. The "cancelled" messages are due to the application timing out the connection and trying again. Originally the timeout was set to 60s, I moved it down to 10s to make it easier to see in the logs:

5657	default	10:53:58.307219+0200	MyNetworkExtension	[C6 687052A1-4CA6-430A-9EA9-D39FD8E7F92E Hostname#dddc988d:443 tcp, tls, attribution: developer, context: Default Network Context (private), proc: 58CA1BEC-474A-365D-857F-027B884B9BED, required interface: utun3(16)] start
5657	default	10:53:58.308305+0200	MyNetworkExtension	[C6 Hostname#dddc988d:443 initial path ((null))] event: path:start @0.000s
5657	default	10:53:58.309396+0200	MyNetworkExtension	[C6 Hostname#dddc988d:443 waiting path (unsatisfied (Interface utun3[16] is required by parameters), interface: en0, scoped, ipv4, ipv6, dns)] event: path:unsatisfied @0.000s, uuid: D2339D03-2826-47AE-88E8-D122D0F78EB0
5657	default	10:53:58.309731+0200	MyNetworkExtension	nw_connection_report_state_with_handler_on_nw_queue [C6] reporting state waiting
5657	default	10:53:58.311177+0200	MyNetworkExtension	[C6 Hostname#dddc988d:443 waiting path (unsatisfied (Interface utun3[16] is required by parameters), interface: en0, scoped, ipv4, ipv6, dns)] event: null:null @0.001s
5657	default	10:53:58.312155+0200	MyNetworkExtension	[C7 15EA8717-040A-4634-BE44-9168217C792C Hostname#e7bf2ee1:443 tcp, tls, attribution: developer, context: Default Network Context (private), proc: 58CA1BEC-474A-365D-857F-027B884B9BED, required interface: utun3(16)] start
5657	default	10:53:58.312637+0200	MyNetworkExtension	[C7 Hostname#e7bf2ee1:443 initial path ((null))] event: path:start @0.000s
5657	default	10:53:58.313308+0200	MyNetworkExtension	[C7 Hostname#e7bf2ee1:443 waiting path (unsatisfied (Interface utun3[16] is required by parameters), interface: en0, scoped, ipv4, ipv6, dns)] event: path:unsatisfied @0.002s, uuid: D9AFA7E1-4F51-46AD-BC30-0DC0A8BD9C68
5657	default	10:53:58.313453+0200	MyNetworkExtension	nw_connection_report_state_with_handler_on_nw_queue [C7] reporting state waiting
5657	default	10:53:58.314203+0200	MyNetworkExtension	[C7 Hostname#e7bf2ee1:443 waiting path (unsatisfied (Interface utun3[16] is required by parameters), interface: en0, scoped, ipv4, ipv6, dns)] event: null:null @0.002s
5657	default	10:53:58.910804+0200	MyNetworkExtension	[C8 866110BB-E8A4-4239-8B55-82EBA9A396CC Hostname#dddc988d:443 tcp, tls, attribution: developer, context: Default Network Context (private), proc: 58CA1BEC-474A-365D-857F-027B884B9BED, required interface: utun3(16)] start
5657	default	10:53:58.911659+0200	MyNetworkExtension	[C8 Hostname#dddc988d:443 initial path ((null))] event: path:start @0.000s
5657	default	10:53:58.912632+0200	MyNetworkExtension	[C8 Hostname#dddc988d:443 waiting path (unsatisfied (Interface utun3[16] is required by parameters), interface: en0, scoped, ipv4, ipv6, dns)] event: path:unsatisfied @0.000s, uuid: D2339D03-2826-47AE-88E8-D122D0F78EB0
5657	default	10:53:58.912796+0200	MyNetworkExtension	nw_connection_report_state_with_handler_on_nw_queue [C8] reporting state waiting
5657	default	10:53:58.914132+0200	MyNetworkExtension	[C8 Hostname#dddc988d:443 waiting path (unsatisfied (Interface utun3[16] is required by parameters), interface: en0, scoped, ipv4, ipv6, dns)] event: null:null @0.002s
5657	default	10:54:07.321548+0200	MyNetworkExtension	[C6 687052A1-4CA6-430A-9EA9-D39FD8E7F92E Hostname#dddc988d:443 tcp, interface: utun3, tls, attribution: developer] cancel
5657	default	10:54:07.321672+0200	MyNetworkExtension	[C6 Hostname#dddc988d:443 tcp, interface: utun3, tls, attribution: developer] cancelled
5657	default	10:54:07.322047+0200	MyNetworkExtension	nw_connection_report_state_with_handler_on_nw_queue [C6] reporting state cancelled
5657	error	10:54:07.323093+0200	MyNetworkExtension	-[NWTCPConnection setupEventHandler]_block_invoke Connection went away while waiting for event
5657	default	10:54:07.323945+0200	MyNetworkExtension	[C7 15EA8717-040A-4634-BE44-9168217C792C Hostname#e7bf2ee1:443 tcp, interface: utun3, tls, attribution: developer] cancel
5657	default	10:54:07.324302+0200	MyNetworkExtension	[C7 Hostname#e7bf2ee1:443 tcp, interface: utun3, tls, attribution: developer] cancelled
5657	default	10:54:07.325110+0200	MyNetworkExtension	nw_connection_report_state_with_handler_on_nw_queue [C7] reporting state cancelled
5657	error	10:54:07.325596+0200	MyNetworkExtension	-[NWTCPConnection setupEventHandler]_block_invoke Connection went away while waiting for event

so there are no other active providers on the device.

Thank you for confirming.

Regarding:

When I overrode this piece of code on my machine to set it to the remote endpoint's address, however, I noticed that the connections were still timing out and the issue was not fixed.

Thank you, I just wanted to confirm here that your connections were not getting caught in a loop.

Thank you for the logs. I'd like to focus on this one first:

5657 error 10:54:07.323093+0200 MyNetworkExtension -[NWTCPConnection setupEventHandler]_block_invoke Connection went away while waiting for event

There are a few reasons that this can happen:

  1. The client timed out because it did not handle trust evaluation from the peer.
  2. The client failed to respond to provideIdentityForConnection.
  3. The connection purely timed out.
  4. A better path became available and the viability changed, which is a way of describing that the tunnel had connection and setup issues.

Based on the information that you have provided me already, I would investigate 3 or 4 as possible options unless there is authentication that is not handled.

Now, looking at these logs:

5657 default 10:53:58.911659+0200 MyNetworkExtension [C8 Hostname#dddc988d:443 initial path ((null))] event: path:start @0.000s 5657 default 10:53:58.912632+0200 MyNetworkExtension [C8 Hostname#dddc988d:443 waiting path (unsatisfied (Interface utun3[16] is required by parameters), interface: en0, scoped, ipv4, ipv6, dns)] event: path:unsatisfied @0.000s, uuid: D2339D03-2826-47AE-88E8-D122D0F78EB0

Is your tunnel completely off the ground yet, or are you starting these connections immediately in startTunnelWithOptions?

Hi Matt, thanks so much for your reply.

We pass nil as the TLS delegate parameter when calling startTunnelWithOptions, so I don't think that should be an issue.

I'd just like to re-state the conditions of the failure mode:

  1. .includeAllNetworks set to true in the provider settings leads to connections created with createTCPConnectionThroughTunnel not connecting
  2. createTCPConnectionThroughTunnel works normally when .includeAllNetworks is set to false
  3. Normal network operations with URLSession.shared work normally in the network extension with .includeAllNetworks set to true
  4. Normal network operations on the rest of the system (i.e., outside of the network extension) work normally with .includeAllNetworks set to true

You gave the following possibilities as worth investigating (let's set aside #1 and #2 for the moment for the reasons mentioned above):

  1. The connection purely timed out.
  2. A better path became available and the viability changed, which is a way of describing that the tunnel had connection and setup issues.

These seem unlikely to me because the code is set up to time out the connection if it's been stuck in .connecting or .waiting for too long (in this case 10 seconds) and retry within a sane amount of time (5 seconds at the moment.) If #3 or #4 were indeed happening, I would imagine they would more likely manifest one or two times, and then on the second or third try I would expect the request to complete successfully.

Is your tunnel completely off the ground yet, or are you starting these connections immediately in startTunnelWithOptions?

We start issuing requests on a queue in a function called connectionEstablished once the tunnel has been successfully started in startTunnelWithOptions:

override func startTunnel(options: [String: NSObject]?, completionHandler: @escaping (Error?) -> Void) {
    [...]
    // Start the tunnel
    adapter.start(tunnelConfiguration: tunnelConfiguration) { adapterError in
        guard let adapterError = adapterError else {
            let interfaceName = self.adapter.interfaceName ?? "unknown"
            wg_log(.info, message: "Tunnel interface is \(interfaceName)")
            completionHandler(nil)
Here -->    self.connectionEstablished()   <--- Here
            return
        }
    
        switch adapterError {
        [...]
        }
    }
    [...]
}

I would appreciate if you could elaborate on what in that last log line would indicate a network path issue.

Yeah, this one is odd. I have seen issues where using includeAllNetworks causes issues for networking tasks, e.g., DNS resolution or authentication outside of the VPN server before the tunnel comes up. In this case if your connection to the remote address is truly inside the tunnel then that should not be causing issues here. You mentioned earlier that you were using a WireGaurd base tunnel. If you go back to a vanilla install of NEPacketTunnelProvider, does this work for you?

Getting back to this - not sure what you mean by 'base install' of NEPacketTunnelProvider. Are you referring to something which just shuttles the packets without doing anything to them? Is that what the base version does? I tried using the base WireGuard implementation (had other things to take care of, hence the delay), but ended up encountering the same issue.

not sure what you mean by 'base install' of NEPacketTunnelProvider. Are you referring to something which just shuttles the packets without doing anything to them? Is that what the base version does?

Yes, if you setup a tunnel with just a plain NEPacketTunnelProvider and without Wiregaurd's implementation, just to provide the machinery works, does this still cause issues? The idea here is to expose if there is another underlying problem in your code or network routing, or if there is something in the external SDK not causing issues with your environment.

`NEVPNProtocol.includeAllNetworks` and `NEPacketTunnelProvider.createTCPConnectionThroughTunnel`
 
 
Q