100% CPU usage in Apple SSH and node.js apps when NETransparentProxyProvider is running

I'm developing transparent proxy extension and noticed the following misbehaviour:

When NETransparentProxyProvider starts, if there are existing connections that matches rules, apps with these connections will get error (-41 Protocol wrong type for socket) on write()/send() syscall.

This leads Apple SSH and node.js to consume 100% CPU because they want to write to socket and can’t, while not receiving read event with an error.

Steps to reproduce (on Big Sur):

  1. Run SSH to some server in the internet.
  2. Run NETransparentProxyProvider example from https://github.com/sfionov/expert-octo-chainsaw
  3. Press a key in SSH
  4. See that Apple SSH consumes 100% CPU in Activity monitor.

In Activity Monitor’s sample you may see thousands of selects and unsuccessful writes.

Usually socket error is processed in read event, not write event, so lack of read event with error on unusable socket confuses a lot. :(

node.js apps with libuv SSL sockets start consuming a lot of CPU too.

Is there any workaround? Or a way I could force this applications to close unusable sockets after network extension start?

Thanks for including a link to your test project and the steps you took to reproduce the issue. With a very generic case I tried to reproduce what you were seeing. First, it looked like your were capturing TCP flows and then just returning NO to let the system handle this SSH / TCP traffic. So I setup some generic NENetworkRule's to capture all TCP/UDP flows and then returned false in handleNewFlow.

let settings = NETransparentProxyNetworkSettings(tunnelRemoteAddress: "x.x.x.x")
    settings.includedNetworkRules = [
        NENetworkRule(remoteNetwork: nil,
                      remotePrefix: 0,
                      localNetwork: nil,
                      localPrefix: 0,
                      protocol:.TCP,
                      direction: .outbound),
        
        NENetworkRule(remoteNetwork: nil,
                      remotePrefix: 0,
                      localNetwork: nil,
                      localPrefix: 0,
                      protocol:.UDP,
                      direction: .outbound)
        
    ]

override func handleNewFlow(_ flow: NEAppProxyFlow) -> Bool {
         
    return false
}

And then opened up an SSH session to a remote server. All worked fine and I barely seen any movement on the CPU usage for the SSH process. The highest it went was ~0.1%.

Now, having said that there may be some differences in my test compared to your test. My test was on macOS 11.5 Beta and I was not doing any network heavy work over SSH. I was just browsing around the server and opening files etc.. What is the work you are doing over SSH? I also opened my SSH session after the NETransparentProxyProvider started in case that matters for this scenario?

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Hello!

It is very important that SSH should be connected while no transparent network extensions are running in the system.

Then I start network extension and just press a key in SSH, and voila:

It is also reproduced with node.js with SSL. If I run the following code with link to large file:

https = require('https')
https.get('link_to_large_file');

And then start transparent proxy network extension, and in minute node.js will eat CPU too.

It is very important that SSH should be connected while no transparent network extensions are running in the system. Then I start network extension and just press a key in SSH, and voila:

Okay, I was able to reproduce this today by starting the SSH session before I started my test with NETransparentProxyProvider. I did see the CPU usage jump to 99.0 - 100.0% for the SSH process as soon as I tried to run a command in the session. Disconnecting the SSH session and reconnecting it did workaround this issue with no CPU problems, but this should definitely be captured in a bug report. Also, please include a sysdiagnose when you submit your bug and include the exact time and date the issue was reproduced.

Please follow up with your Feedback ID when you have it.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

I actually already created the feedback: FB9125326

Also added information how to reproduce bug using "node.js" to it.

I actually already created the feedback: FB9125326

Thank you, I see this internally and have copied myself on it for more information.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

JFY, this bug has been reported by me as well (FB9070195) and discussed on the forum: https://developer.apple.com/forums/thread/679465

The interesting thing is that a more recent version of OpenSSH has a fix that properly detect this unexpected behavior: https://github.com/openssh/openssh-portable/commit/65d6fd0a8a6f31c3ddf0c1192429a176575cf701

JFY, this bug has been reported by me as well

I’m not sure why DevForums is censoring FB numbers again [1] but I thought I’d post a non-censored bug number so that Future Quinn™ (and Future Matt™) have it handy: FB9070195.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

[1] I’ve filed a new bug about that (r. 79064328 ).

The interesting thing is that a more recent version of OpenSSH has a fix that properly detect this unexpected behavior

Eventually, Apple SSH and libuv (nodejs) may adopt to this behaviour too... But it will not be an ideal solution, socket continues to behave in a way that usual socket doesn't, and there are may be other apps that doesn't expect this. 😦

On Big Sur 12 beta 2 and 11.5 beta 4 issue is still reproducible with node.js.

Almost all Electron apps including Signal messenger are affected. :(

Please continue to add any extra information that has been obtained through test cases, reports, or by any other means to your bug report to aide in the investigation. This bug is with the correct people, there is just no update to share at this time.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

FYI, SSH problem is fixed in Big Sur 12 beta. Apparently, this was done by SSH update while the root problem still persists (FB9125326). Many other apps are affected.

I think, the best way to solve this and many other problems would be to keep existing connections alive when App Proxy is activated (FB8969320).

100% CPU usage in Apple SSH and node.js apps when NETransparentProxyProvider is running
 
 
Q