NETransparentProxyProvider disturbs other apps on start

Just wanted to draw some attention to the problem caused by NETransparentProxyProvider start.
  1. It breaks all existing TCP connections which is not good.


2. UDP "connections" just hang. This causes a lot of troubles for UDP based apps like VPNs, streaming, etc. Unlike TCP, UDP apps are not designed to recreate UDP socket to clear the problem so users have to restart the apps manually.
FB8969320

3. Any running instance of the built-in SSH client hangs on NETransparentProxyProvider start and begins to unitize 100% of CPU. This is very confusing and annoying for the users.
FB9070195

All these problems persist in Big Sur 11.4 Beta. Are there any plans to fix them?
I can provide more info if needed.

Thanks,
Sergey

All these problems persist in Big Sur 11.4 Beta. Are there any plans to fix them?
I can provide more info if needed.

I can tell you that these bugs are not being ignored and are in the right hands, but there is no information to share on the roadmap for these items or if/when they will be fixed.


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com
As for SSH client utilizes 100% of CPU (FB9070195),

We have analyzed the source code of macOS SSH client. It seems that the problem is known and it has been already solved in the regular version of OpenSSH:
https://github.com/openssh/openssh-portable/commit/65d6fd0a8a6f31c3ddf0c1192429a176575cf701

So it may be needed to update SSH or apply the fix (very small).
Okay, so it looks like OpenSSH library is able to workaround this by only terminating the existing connection only if a non 0 value is returned from ssh_packet_write_poll? Is that what you are mentioning here? The reason I am asking this is because if this is a known workaround for this issue then it might be handy to mention it here. Have you had success with this Sergey?


Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Okay, so it looks like OpenSSH library is able to workaround this by only terminating the existing connection only if a non 0 value is returned from ssh_packet_write_poll?

I am not an expert in OpenSSH code. It seems that under normal conditions (regular socket errors), connections get terminated properly. But when NETransparentProxyProvider starts, unusual things happen. The sockets get unusual errors like error 41 (protocol wrong type for socket), etc. This somehow breaks SSH logic. It goes into an infinite loop and ssh_packet_write_poll returns a non-zero. The fix just checks for this condition and terminates the connection.

We are pretty much sure that this fixes the problem. Because the current version of OpenSSH (8.6 from brew) works without the problem. Although, we were unable to compile SSH client that comes with macOS (opensource.apple.com). The build environment is too complex to setup properly.

Another (a better) way to solve this is to make existing connections working after NETransparentProxyProvider start or at least break them the regular way that is well expected and tested by all networking apps like SSH.

Another (a better) way to solve this is to make existing connections working after NETransparentProxyProvider start or at least break them the regular way that is well expected and tested by all networking apps like SSH.

Agreed. Thank you for the insight here.

Please keep your bug report updated as you discovery any new information.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Will do.

Although, it is still a good idea to update macOS SSH client. The above-mentioned fix was added for a reason. The original problem was not related to Network Extension. So, it means that the bug can happen under some conditions even without Network Extension.

NETransparentProxyProvider disturbs other apps on start
 
 
Q