A network extension breaks the Flutter package manager

Any kind of network extension, when active, even when it doesn't actually interfere with the network traffic (e.g. always returning NO from handleNewFlow: of NETransparentProxyProvider), seems to break the Flutter package manager:
Code Block
Running "flutter pub upgrade" in flutter_repro...           ⣽Unhandled exception:
Bad state: Future already completed
#0   _AsyncCompleter.complete (dart:async/future_impl.dart:43:31)
#1   _NativeSocket.startConnect.<anonymous closure>.connectNext.<anonymous closure> (dart:io-patch/socket_patch.dart:682:23)
#2   _NativeSocket.issueWriteEvent.issue (dart:io-patch/socket_patch.dart:1102:14)
#3   _NativeSocket.issueWriteEvent (dart:io-patch/socket_patch.dart:1109:12)
#4   _NativeSocket.multiplex (dart:io-patch/socket_patch.dart:1130:11)
#5   _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:168:12)

I've managed to reproduce this with a sample network extension project and filed a feedback FB8952020.

Any thoughts on how to work around this issue would be appreciated.
Thanks for opening a bug report. A few follow up questions that I would ask you to add info on to your bug report; You mentioned that this happens with "Any kind of network extension," and then you mentioned NETransparentProxyProvider so does that mean that his is specific to macOS 11 and NETransparentProxyProvider, or would this also reproduce on NEFilterDataProvider? Is this specific to macOS 11?

Also, I would add the time and date your handleNewFlow method returned NO on the failed connection to your bug report. The reason I mention this is that I can see a flow that may have originated from an app named dart, but I can see what looks like an attempt to handle this flow with the following logs:

Code Block text
2020-12-25 09:15:01.907+0300 kernel (3199889955): Created
2020-12-25 09:15:01.907+0300 kernel (3199889955): Connecting
2020-12-25 09:15:01.907+0300 TheExtension[76398:340025] [] (0): Flow 3199889955 is connecting
2020-12-25 09:15:01.907+0300 TheExtension[76398:340025] [] (3199889955): New flow: NEFlow type = stream, app = dart, name = , x.x.x.x:0 <-> x.x.x.x:443, filter_id = , interface = en0
2020-12-25 09:15:01.907+0300 kernel (3199889955): received connect result 61
2020-12-25 09:15:01.907+0300 kernel (3199889955): No local address provided
2020-12-25 09:15:01.907+0300 kernel (3199889955): No remote address provided
2020-12-25 09:15:01.907+0300 kernel (3199889955): No application data provided in connect result
...
2020-12-25 09:15:01.907+0300 kernel (3199889955): Destroying, app tx 0, tunnel tx 0, tunnel rx 0
2020-12-25 09:15:01.907+0300 TheExtension[76398:340025] [] (3199889955): Destroying, client tx 0, client rx 0, kernel rx 0, kernel tx 0


However, this may not exactly be the case you are calling out here and just looks like it. So if there is a more focused example to call out, that would be great info to add.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com
Hi, @meaton

I stand corrected; Initially I was under the impression that any kind of network extension is affected, but after careful checking, it's only NETransparentProxyProvider that is causing trouble.

The included sample always returns NO from handleNewFlow::
Code Block
- (BOOL)handleNewFlow:(NEAppProxyFlow *)flow {
return NO;
// commented code omitted
}

, so it's safe to assume that any attempt to handle this flow is actually the incorrect behaviour of the API.

This also means that only macOS 11 is affected (unless NETransparentProxyProvider has been ported to Catalina as well).
Regarding:

I stand corrected; Initially I was under the impression that any kind of network extension is affected, but after careful checking, it's only NETransparentProxyProvider that is causing trouble.

Thank you for the clarification on this. I also did a quick check this morning and did see that your bug report has landed in the right place.


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com
Hi, @meaton!
Is there any news regarding this issue? I can still reproduce on macOS 11.2 (20D64).
No update at this time. There were some fixes that went out in (20D64), but as far as I know, this was not one of the fixes. I should note that I do not have Flutter installed or configured on my test machine so I am not able to independently test this either.


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com
The Flutter package manager uses dart lang to do its stuff, and NETransparentProxyManager breaks all TCP connections made from the dart interpreter. The problem is that dart opens the connection asynchronously (with a socket in non-blocking mode) and tries to get the local port without waiting for the connection end. But if NETransparentProxyManager is enabled, the local port isn't available yet (it will be available only after [NEAppProxyFlow openWithLocalEndpoint:completionHandler:] call), so dart script throws an exception, which ends up with connection fail. I've filed feedback about it - FB8999915.
Thank you @dverevkin. Regarding:

The problem is that dart opens the connection asynchronously (with a socket in non-blocking mode) and tries to get the local port without waiting for the connection end.

If the Dart side waits for the remote side of the connection to setup first and then calls:

[NEAppProxyFlow openWithLocalEndpoint:completionHandler:]

Does this work out better then with a localEndpoint?


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

If the Dart side waits for the remote side of the connection to setup first and then calls:
[NEAppProxyFlow openWithLocalEndpoint:completionHandler:] 
Does this work out better then with a localEndpoint?

I've changed the dart SDK code to get local port only after the connection is established, and it seems all works fine. I'm not sure about the right solution: should dart developers fix their code or NE API should be changed to be able to set localEndpoint before [NEAppProxyFlow openWithLocalEndpoint:completionHandler:] call.

I've changed the dart SDK code to get local port only after the connection is established, and it seems all works fine.

Excellent news.

I'm not sure about the right solution: should dart developers fix their code or NE API should be changed to be able to set localEndpoint before [NEAppProxyFlow openWithLocalEndpoint:completionHandler:] call.

As a matter of workflow I always recommend to first open the remote side of the connection with an API like NWConnection, or nw_connection_t. Then, after the remote side has gone into the ready state, open the local flow via [NEAppProxyFlow openWithLocalEndpoint:completionHandler:].


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

As a matter of workflow I always recommend to first open the remote side of the connection with an API like NWConnection, or nw_connection_t. Then, after the remote side has gone into the ready state, open the local flow via [NEAppProxyFlow openWithLocalEndpoint:completionHandler:].

Yes, I do it as you've described in my network extension. But the problem is that the dart interpreter doesn't wait for the connection establishment and tries to get the local port (via getsockname) right after connect call on the non-blocking socket. So, the sequence of events is the following:
  1. Dart calls connect() on a non-blocking socket.

  2. Network extension intercepts dart's flow and calls my [NETransparentProxyProvider handleNewFlow:] handler.

  3. My network extension starts nw_coneection_t and returns from [NETransparentProxyProvider handleNewFlow:] handler. As an alternate scenario, it can just return false to return intercepted flow to the system.

  4. Dart's control flow returns from connect() with EINPROGRESS errno.

  5. Right after connect() returns, dart calls getsockname() to get the local port of TCP connections and gets 0, which isn't the expected value. So dart throws an exception and fails TCP connection (closing its socket).

  6. After nw_connection_t becomes ready network extension calls [NEAppProxyFlow openWithLocalEndpoint:completionHandler:] with localEndpoint corresponding to nw_connection_t one. But the dart have closed connection already and completion handler of [NEAppProxyFlow openWithLocalEndpoint:completionHandler:] returns error (flow is not connected, or something like that).

So, should I start the new issue in the dart GitHub project or expect the new API in NetworkExtension.framework?

Regards,
Denis Verevkin
Thank you Denis. In this case I would open an issue against the Dart project so there is a detailed description for other developers on how to workaround this issue. From there make sure to keep your bug report open and updated on our end and you should receive updates as things progress.


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com
@dverevkin Hi! Could you please submit your patch or open a bug report with what you could find out to Dart? I've looked through their GitHub issues and got the impression that the Dart developers don't have a clue what's happening or how to fix it, while a few people are having this exact issue in different circumstances.
@dverevkin

Denis, please take a look at this issue:
https://github.com/dart-lang/sdk/issues/45116

It'd be useful if you could supply info about your patch there.
@dverevkin Please could you add your patch either here or to https://github.com/dart-lang/sdk/issues/45116 - there are a lot of people (including myself) who would greatly benefit from this.

Many thanks

Nic Ford
A network extension breaks the Flutter package manager
 
 
Q