Transparent network proxy ... stops?

I don't know how to go forward on this one: we have a test engineer who can, reliably, cause networking to simply stop working. Our app has 3 major components -- a proxy daemon, a containing UI app, and a network extension. Because I am lousy at using debuggers, the extension logs every single new flow it gets (to .debug), as well as a bunch more.

When our engineer gets this problem, the proxy may crash a couple of times, but is still running; the extension is also still running, but no longer gets new flows. Networking outside the machine no longer works. But doing echo foo | nc 127.0.0.1 88 succeeds (or, at least, doesn't print any error -- and also doesn't get any log messages from the extension).

I've got a sysdiagnose from it, as well as a bunch of logs, and all I can really see is that the proxy app restarted, and when it came back, it said there was no networking available. And that the extension stopped logging new flows at about the same time.

I have not been able to reproduce this -- even though our engineer is using the same script I wrote to try to reproduce it, and he can, within an hour. (As opposed to my systems, which have been running for almost a day on both an M1 and Intel system.)

Any ideas of things I should try looking for in the sysdiagnose?

Our app has 3 major components -- a proxy daemon, a containing UI app, and a network extension. the proxy may crash a couple of times, but is still running; the extension is also still running, but no longer gets new flows. even though our engineer is using the same script I wrote to try to reproduce it, and he can, within an hour

Thank you for that information. What is the proxy daemon doing in your workflow and is your network extension communicating with your proxy? If you are proxying flows from the daemon, is your daemon running out of file descriptors? Do you see an error that says, " Too many open files?"

If you do not communicate with the daemon, does you network extension work as it should, and is it still able to reproduce this issue?

I thought of that, but I don't think that's going to be it -- the launchd.plist file for it sets the number of file descriptors to a million or so, and there's no messages about descriptors. More worryingly, though, the extension stops getting any network flows. While in this state, I had the engineer do printf foo | nc 127.0.0.1 88, returned immediately with a 0 exit status. And no logs in the extension.

Transparent network proxy ... stops?
 
 
Q