I don't know how to go forward on this one: we have a test engineer who can, reliably, cause networking to simply stop working. Our app has 3 major components -- a proxy daemon, a containing UI app, and a network extension. Because I am lousy at using debuggers, the extension logs every single new flow it gets (to .debug
), as well as a bunch more.
When our engineer gets this problem, the proxy may crash a couple of times, but is still running; the extension is also still running, but no longer gets new flows. Networking outside the machine no longer works. But doing echo foo | nc 127.0.0.1 88
succeeds (or, at least, doesn't print any error -- and also doesn't get any log messages from the extension).
I've got a sysdiagnose from it, as well as a bunch of logs, and all I can really see is that the proxy app restarted, and when it came back, it said there was no networking available. And that the extension stopped logging new flows at about the same time.
I have not been able to reproduce this -- even though our engineer is using the same script I wrote to try to reproduce it, and he can, within an hour. (As opposed to my systems, which have been running for almost a day on both an M1 and Intel system.)
Any ideas of things I should try looking for in the sysdiagnose?