Our transparent proxy provider sends flows to a daemon which analyzes and then does proxying. Works fine.
Except that sometimes it stops working. As far as I can tell, it's due to DNS not working. Queries hang -- we've got some internal ones we log, that have timed out after 20 or 30 seconds. Now, clearly, we're doing something bad (because if we kill the daemon and it restarts, everything goes back to working).
Unfortunately, I have forgotten so much I can't figure out how to see where it's broken! Things like dig @8.8.8.8 com. any
fail -- I am presuming because it's trying to do a lookup of "8.8.8.8" and that fails, but I could be wrong. Admittedly, that one doesn't time out, it simply says no servers could be reached. Meanwhile, pinging that address works. (And, also, the local DNS host -- the one provided via DHCP and listed in /etc/resolv.conf
and ipconfig getstatus
-- behaves the same way.)
I haven't been able to reproduce this myself, unfortunately. Although I have, somewhat interestingly, had a similar issue, which was clearly due to a Google Home WiFi access point (as resetting it fixed the problem, as does moving to another area of the house such that a different AP in the mesh takes over).
On my FreeBSD systems, I'd run tcpdump and truss/ktrace on named, but as I said, I've forgotten so much about how macOS does DNS I'm flailing.
Help?