Network speed slow down using NETransparentProxyProvider system extension

Hi,

I am developing a simple passthrough proxy system extension using NETransparentProxyProvider. This is what the extension fundamentally does:

  1. In handleNewFlow open a connection to the remote endpoint using CreateTCPConnection method in Tunnel provider.
  2. Once the remote endpoint is connected open the NEAppProxyTCPFlow and start both ends of the flow.

When I use perf to test the network speed while sending I see a 10 times drop in speed when using my system extension.

iperf -c <server_address>

iperf uses 131072 byte blocks to send data by default for 10 seconds

My code for inbound and outbound flows is quite simple:

For inbound flow read from the remote connection, in the completion handler for read write to the flow and in the completion handler for flow start another read from remote.

For outbound flow read from the flow, in the completion handler write to the remote and in the completion handler for writing to the remote trigger another read from the flow.

Is there any problem with the above approach which can cause network transfer slowdown?

I also captured Wireshark traces for cases with and without my system extension and I see a pattern there.

When I read from the flow the system extension reads chunks of varying sizes irrespective of what the application is sending. Eg. I see 4096, 16384, 8192. When I send these chunks to the remote connection it keeps waiting for ACKs for each chunk irrespective of the TCP window size. I also see a [PSH, ACK] in the last packet for each chunk.

Without my system extension, iperf sends many packets in short time without [PSH,ACK] as it is using bigger buffer and does not wait for ACKs so frequently. It respects the TCP window size.

I can provide any details needed to help root cause this problem. I am testing this on macOS BigSur 11.5.1

Any help is greatly appreciated

Regards

My code for inbound and outbound flows is quite simple: For inbound flow read from the remote connection, in the completion handler for read write to the flow and in the completion handler for flow start another read from remote. For outbound flow read from the flow, in the completion handler write to the remote and in the completion handler for writing to the remote trigger another read from the flow. Is there any problem with the above approach which can cause network transfer slowdown?

Nope, this is roughly the same approach spelled out here in Handling Flow Copying.

Regarding:

When I read from the flow the system extension reads chunks of varying sizes irrespective of what the application is sending. Eg. I see 4096, 16384, 8192. When I send these chunks to the remote connection it keeps waiting for ACKs for each chunk irrespective of the TCP window size. I also see a [PSH, ACK] in the last packet for each chunk.

There may be a good reason that this is happening, one may be that the provider is waiting to see if their is more data that needs to be pushed to the peer, but a standard TCP connection may be using TCP_NODELAY instead to push the data right away without waiting for the window to fill up.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Hi Matt,

Thanks very much for responding to the thread. Would you be able to advice me on how to fix this problem? I assume the completion handler for NWTCPConnection.write is called as soon as the data is written into the local socket buffer and does not wait for ACKs from the remote side.

The issue is happening consistently and is fairly easy to reproduce. I also see CPU usage for my system extension fluctuate around 50% when running the iperf test. I can check anything you would like to investigate at my end. If you want I can also share the code or the Wireshark captures that I collected.

Regards

I assume the completion handler for NWTCPConnection.write is called as soon as the data is written into the local socket buffer and does not wait for ACKs from the remote side.

Are these connections using TLS? If not, when using TLS do you see a much more consistent contract here between how data is sent and received on the connection?

Regarding:

I also see CPU usage for my system extension fluctuate around 50% when running the iperf test

Take a look at this in Instruments under either Time Profiler or Allocations to see if there are performance gains that could be achieved here.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Hi Matt,

We are not using TLS connections. This is what I do:

connection = provider.createTCPConnection(to: appProxyFlow.remoteEndpoint, enableTLS: false, tlsParameters: nil, delegate: nil)

I didn't understand your point on using TLS here. I am just forwarding whatever I get from application flow to the TCP connection created above. Please note that I don't have any tunnel server. I am passing to the remote endpoint just using the connection created above. I just want to see all traffic and note some metadata about connection like: when the connection was established, do some web activity monitoring etc. I don't intend to decrypt TLS in any way.

Although I am digressing here now from the original topic: but I tried using NEFilterDataProvider as well to monitor the traffic and I see similar network speed slowdown. I am not doing anything in the handleInbound and handleOutbound callbacks except for the below line:

return NEFilterDataVerdict(passBytes: readBytes.count, peekBytes: Int.max)

I also trying playing a bit with what I pass to peekBytes but didn't see much change and couldn't find any guideline on how to set peekBytes. I can open a separate thread for this though if you would like.

I will check in Instruments if I can do something about the CPU usage, but the network speed slowdown is a blocker for me.

Regards

I didn't understand your point on using TLS here.

I am asking this to see if the using TLS makes the packet transmission a bit more reliable in terms of transmission size.

Regarding:

I also trying playing a bit with what I pass to peekBytes but didn't see much change and couldn't find any guideline on how to set peekBytes

Using peekBytes will always incur a bit of a performance overhead because you are examining chunks of the inbound and outbound traffic. There is no real recommendation on peekBytes here, just enough so that you can make a determination on the filter verdict.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Hi Matt, 

Sure, I will check the transmission size for TLS connections. 

Regarding FilterDataProviders, your comment suggests to me that FilterDataProviders may not be a good solution for monitoring data from the start of the connection to the termination of connection. If this is true, what should be used to monitor traffic for the entire duration of the connection? 

Regards

Regarding FilterDataProviders, your comment suggests to me that FilterDataProviders may not be a good solution for monitoring data from the start of the connection to the termination of connection. If this is true, what should be used to monitor traffic for the entire duration of the connection?

This is the recommended solution to use to monitor the received bytes in and out of the connection. My point was that to improve performance, only monitor what you need to make a flow verdict decision instead of monitoring all the way to the very end.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Hi Matt,

I don't want to make any flow verdict, I just want to monitor the traffic for the entire duration of the connection. I return

NEFilterDataVerdict(passBytes: readBytes.count, peekBytes: Int.max)

from my inbound and outbound data handlers. This is resulting in a huge drop of more than 10 times in network bandwidth.

Please let me know if I didn't understand your answer correctly.

Any update on this issue, because I am also facing the same problem.

Network speed slow down using NETransparentProxyProvider system extension
 
 
Q