Monterey 12.0 appproxy network down with dns duration test

Hi there,

This is found with duration test on Monterey 12.0.

We are using appproxy and pipe through all udp traffics including dns requests (udp port 53).

With below script doing only nslookup per second, in a couple of hours, it just fails. No network at all although no error on appproxy level reported. Only way to recover so far is uninstall the system extension.

#!/bin/bash

i=1
while true
do
	i=$((i+1))
	echo "loop $i"
	nslookup www.google.com

	sleep 1
done

Tried the latest 12.1 beta, same failure.

This test has no problem with Bigsur including 11.6.

Thanks in advance for any suggestion.

Regards Richard

With below script doing only nslookup per second, in a couple of hours, it just fails. No network at all although no error on appproxy level reported.

I suspect that this is an error somewhere at a lower level in the system that is causing an issue for the proxy, and thus taking the network down with it. One possible way to investigate this would be to see if your proxy is leaking file descriptors. A very brute force way to investigate if you have lingering socket's laying around would be to use lsof from the terminal while your proxy is running. Checkout $ man lsof for more here.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Thanks Matt for the suggestion. Tried some lsof traces with below command when started and when network failed after several hours.
sudo lsof -p $(pgrep fzmacappproxy) | wc -l
sudo lsof -p $(pgrep fzmacappproxy) -iTCP -iUDP -n -P | wc -l
sudo lsof -p $(pgrep fzmacappproxy)
sudo lsof -p $(pgrep fzmacappproxy) -iTCP -iUDP -n -P

So far, didn't see too much difference.

Minimised it to only passing through udp packets, still see the same issue.

Seen from the lsof log, still no too much clue threre.

Okay, do you see any kernel logs that bubble up to the surface when these errors take place? For example, ECONNREFUSED or ENOPROTOOPT?

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Thanks Matt for the suggestion. Didn't see any ECONNREFUSED or ENOPROTOOPT as searched from the console log. Will double confirm anyway.

Regards Richard

Double confirmed that didn't find the ECONNREFUSED or ENOPROTOOPT you mentioned.

Some other test cases done:

  • Use only dnsproxy and pipe through all dns requests

Same failure after a while

  • Use NWConnection rather than NWUDPSession for appproxy udp pipe though.

Same failure after some time.

  • Do without redirect QUIC packets inside appproxy

Same failure after some time.

Is it a Monterey bug since nothing wrong of this kind found on BigSur?

Is it a Monterey bug since nothing wrong of this kind found on BigSur?

If this worked fine in Big Sur and does not work in Monterey, then yes, you should open a bug report here with a sysdiagnose. Please respond back with the Feedback ID.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Fired a TSI: Follow-up: 785557535

Fired a TSI: Follow-up: 785557535

Please open a bug report here with a sysdiagnose, and post the Feedback ID once you have done so. It looks like the TSI response was the same here as well, instructing you to open a bug report.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

bug report as below: It is FB9751370 - macOS. https://feedbackassistant.apple.com/feedback/9751370

bug report as below: It is FB9751370 - macOS.

Thank you! It looks like this shows up in your sysdiagnose:

[C1668 IPv4#c7cef39a:53 waiting path (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] event: flow:failed_connect @0.002s, error No space left on device

You may want to run through this with Instruments to make sure you are not leaking memory. Now, profiling a System Extension in Instruments can be difficult so I will point to this thread.

Matt Eaton
DTS Engineering, CoreOS
meaton3@apple.com

Thanks for the feedback. Very interesting. I am trying to trace it down in this regard. Also, I have uploaded in the bug report my appproxyprovider.swift, udp process part. Thanks in advance to take a review.

Regards Richard

Hi there,

Not sure whether you managed to reproduce the said issue with the shrink down project I shared from bug report channel.

Just to update you some new findings from my end: With the same test case: doing dns request with my dns.sh script with only udp pipethrough, but based on Big Sur 11.5.2, I can see the same memory leak: 10 hours results in >300MB memory consumption by the system extension although it begins only with around 20MB.

I cannot trace further back as Big Sur 11.5.2 is the last Big Sur installer available from web.

I wonder anything ongoing in this regard.

Processes: 461 total, 2 running, 459 sleeping, 1737 threads                                   08:20:48
Load Avg: 2.54, 2.06, 1.88 CPU usage: 10.97% user, 5.90% sys, 83.11% idle  SharedLibs: 464M resident, 81M data, 66M linkedit.
MemRegions: 64289 total, 4344M resident, 227M private, 2069M shared. PhysMem: 16G used (3092M wired), 47M unused.
VM: 2487G vsize, 2321M framework vsize, 0(0) swapins, 0(0) swapouts. Networks: packets: 9547216/12G in, 6532190/610M out.
Disks: 2223114/65G read, 2284134/46G written.

PID  COMMAND   %CPU TIME   #TH #WQ #POR MEM  PURG CMPR PGRP PPID STATE  BOOSTS  %CPU_ME %CPU_OTHRS UID FAULTS COW  
93043 com.familyzo 0.4 14:39.27 21  6  73  318M+ 0B  0B  93043 1  sleeping *0[1]   0.01479 0.00000  0  94013+ 1074

Hi there,

Regarding the thread https://developer.apple.com/forums/thread/62310 for profiling system extension, I guess it is the content as quote below.

There are some questions as I tried:

  1. I have the Info.plist file as below not sure whether it is right.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
...
	<key>MallocStackLogging</key>
	<string>1</string>
	<key>MallocStackLoggingNoCompact</key>
	<string>1</string>
</dict>
</plist>

  1. When I start xcode to attach to the system extension process, it says the process is running as root.

So I use sudo /Applications/Xcode.app/Contents/MacOS/Xcode and managed to attach the system extension process, but then it seems not functional properly: dns request cannot pass through anymore (as I pass through all udp process). Anything wrong with that? I tried also sudo open /Applications/Xcode.app but again it cannot attach to system extension process.

  1. In case with all that profiling successful, I am not sure whether it can tell where the memory leak happens.

Thanks in advance for any suggestion.

Regards Richard

Thanks!

I’ve recently been helping a developer with this in the context of a DTS tech support incident, and I came up with a technique that, while not a full workaround, can help a lot. Here’s the highlights: You can set environment variables in your provider by modifying its

Info.plist . For example, adding an entry like this: XPCService EnvironmentVariables FOO bar will set the

FOO environment variable to bar . That opens up the possibility of using memory management features enabled by environment variables, for example, those documented in the

malloc man page. Of those, MallocStackLoggingNoCompact is the heavy hitter. Once you have

MallocStackLoggingNoCompact set, Xcode’s memory graph feature becomes super useful. To wit: Start your provider in the normal way. Attach to it with Xcode (Debug > Attach to Process). In the debug pane, click the memory graph button. You can explore the memory graph interactively with Xcode, but you can also export it to disk (File > Export Memory Graph). The resulting

.memgraph file can be passed to a variety of command-line tools for memory analysis, including heap

, leaks

, and malloc_history (all of which have their own man page). You can learn more about memory graphs in WWDC 2018 Session 416 iOS Memory Deep Dive. Share and Enjoy — Quinn “The Eskimo!” Apple Developer Relations, Developer Technical Support, Core OS/Hardware let myEmail = "eskimo" + "1" + "@apple.com"

By the way, is there any feedback on the shrink down project I shared? Any luck to reproduce the issue? And anything found from the diagnostics package I shared?

Thanks and regards Richard

Monterey 12.0 appproxy network down with dns duration test
 
 
Q