UDP Transparent proxy stops working after ENOBUFS

Hi,

I have a simple UDP Transparent proxy test - I open udp socket from a python program and blast datagrams in a tight while loop, and in the proxy system extension code, I just echo the datagram - i.e. I do flow.readDatagram() and then write it back using flow.writeDatagram(). I try the above with datagram sizes upto 9000 bytes, it seems to work allright. I launch two dozen of these python tests parallely and all of them write data and read the data back.

The problem starts when I increase the datagramsize - say I make it 9100 bytes, the writes from the python test script often ends up with a "write(net): No buffer space available" - which is totally fine and understandable if the kernel runs out of buffers to store all these datagrams. But the issue is that if I kill all those python scripts and then try sending data to that destination at a much slower rate (like using a netcat), my transparent proxy code does not get the handleNewUDPFlow() callback. For every new flow/endpoint I usually see a bunch of logs which says handleNewUDPFlow with filter so and so etc.. - after ENOBUFS its radio silence, no callbacks, none of those logs.

It looks like some bug in the transparent proxy library, is this a test case anyone else has tried/faced similar issues ?

Rgds, Gopa.

When building an NE provider you have to make sure that you don’t consume too much memory. I discuss this in general Network Extension Provider Memory Strategy but let’s ask about this specifically:

I do flow.readDatagram() and then write it back using flow.writeDatagram().

Do you delay the next read until after the write has completed? The structure I recommend here is something like pseudo code:

func readNext()
    flow.read() { datagram in
        flow.write(datagram) {
            readNext()
        }
    }
}

This results in a strict limit on how much data gets buffered in your provider.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Hi Quinn,

Thanks for the response, so I have made the problem statement even more easier for you / apple :-) - I don't even need to write it back, I am just reading the frames as and when they come and doing nothing / i.e drop it. And if I blast 9216 bytes from the python side, three or four of those flows for 30 seconds (till ENOBUFs start popping up) can cause the entire NETransparent proxy stuff to lockup. Note that after this, NOT JUST UDP, TCP IS ALSO LOCKED UP !!! Even tcp handleNewFlow() is not getting called!

Whatever happens, how much ever data is sent, enobufs or not, the code should not lock up like this - that is a BAD bug! And that too it locks up TCP too! Please can you check with the team owning this library and see whats going on?

--- apple NE code--

   func readUdp(_ flowid: Int64, _ flow: NEAppProxyUDPFlow, _ endp: NWHostEndpoint) {
       flow.readDatagrams(completionHandler: { data, _, error in
           guard let data = data else {
                return
          }
          if error != nil {
              return
          }
          if data.count == 0 {
               return
          }
          for data in data {
              if data.count == 0 {
                  return
              }
          }
          // Do nothing, just read more from this flow
          readUdp(flowid, flow, endp)
        })
  }

    func handleNewUDPFlow(_ flow: NEAppProxyUDPFlow, initialRemoteEndpoint endPoint: NWEndpoint) -> Bool {
        flow.open(withLocalEndpoint: nil, completionHandler: { error in
            if let err = error {
                logger.error("flow open error")
            } else {
                readUdp(flowid, flow, endp)
            }
      }

---- Python code to blast packets-----

#!/usr/bin/env python3

import time
import os
from random import *
import subprocess
import sys
import socket
import errno

iter = 1
enobuf = False
send = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) 
send.settimeout(5.0)

while True:
    print("----- ITERATION ", iter, "------")
    iter += 1

    chars = []
    size = randint(1, 9216)
    hdr = size.to_bytes(2, byteorder='big')
    for i in range(0,size-2):
        chars.append(chr(ord('a')+i%30))

    out = ''.join(chars)
    try:
        send.sendto(hdr + bytes(out, 'utf-8'), ("100.64.0.99", 555))
    except OSError as e:
        if e.errno == errno.ENOBUFS:
            print("ENOBUF")
            enobuf = True
            pass


Rgds, Gopa.

Also Quinn, just to calrify, there is NO memory buildup in the process running the system extension. The memory usage when I am blasting the packets and doing the above test remains more or less the same as the baseline memory usage. So its not that the provider is buffering up too much or anything like that. Something / some state machine inside the transparent proxy lib is going haywire, the serious part is it leads to a completely lockup of tcp and udp !

Also looks like someone else has reported this but not narrowed down as much, looks similar for sure - thread 700065 

Also let me know if you need my help with repro or you want me to screen share and show the problem etc.., I will be more than happy to help in anyways to get this fixed!

So Quinn, I did more experimentation trying to help narrow this down - I had missed this piece of information in the logs when the issue happened

com.edgeguard.tunnel: (NetworkExtension) [com.apple.networkextension:] (0): Failed to read an entire packet
com.edgeguard.tunnel: (NetworkExtension) [com.apple.networkextension:] (0): control: aborting director due to EOF
com.edgeguard.tunnel: (NetworkExtension) [com.apple.networkextension:] (0): Aborting the director
com.edgeguard.tunnel: (NetworkExtension) [com.apple.networkextension:] (2061696646): Closing reads (sending SHUT_WR), closed by plugin
com.edgeguard.tunnel: (NetworkExtension) [com.apple.networkextension:] (2061696646): Closing writes, sending SHUT_RD
com.edgeguard.tunnel: (NetworkExtension) [com.apple.networkextension:] (2037585060): Closing reads (sending SHUT_WR), closed by plugin
com.edgeguard.tunnel: (NetworkExtension) [com.apple.networkextension:] (2037585060): Closing writes, sending SHUT_RD

And as soon as that happens, on all the flows on which I am reading, I get a data with count 0, which I guess means EOF, and I close those flows.

And after this point, I do not get any new flows on tcp or udp into the proxy!

I understand if the "director" has to close flows because of bad length etc.., thats fine by me, but getting the entire proxy stuck because of this - thats a HUGE problem!

There is another thread 700397 that talks about this same issue, so looks like I am at least the third one to report this problem

I don't even need to write it back, I am just reading the frames as and when they come and doing nothing / i.e drop it.

Well, that’s not good. This is definitely seems bugworthy to me, and I encourage you to file a bug about it. Please post your bug number, just for the record.

Also looks like someone else has reported this but not narrowed down as much, looks similar for sure - thread 700065

There is another thread 700397 that talks about this same issue, so looks like I am at least the third one to report this problem

Well, mentioned this problem in DevForums. Unfortunately neither of the folks on those other threads posted a bug number, so there’s no way for me to track down Apple’s official response to this.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Thanks Quinn. I have opened a ticket here https://feedbackassistant.apple.com/feedback/11427408 - unfortunately with this bug in picture, we (or anyone) cannot use UDP proxy reliably because of the looming threat of locking up udp and tcp on getting an enobufs, I sincerely hope apple can fix this fast!

Sounds related to the same issue I reported some years ago https://developer.apple.com/forums/thread/131827 (there were other threads too)

UDP Transparent proxy stops working after ENOBUFS
 
 
Q