Post

Replies

Boosts

Views

Activity

Transparent Proxy Provider, UDP, mbufs, and inevitable panics
First, for the  employees reading, I filed FB14844573 over the weekend, because this is a reproducible panic or hang. whee I ran our stress tests for an entire long weekend, and my machine panicked, due to mbufs. Normally, I tell my coworkers that we can't really do anything to cause a panic -- but we're doing network things, so this is an exception. I started periodically testing the mbufs while the tests were running -- netstat -m | grep 'mbufs in use' -- and noticed that in fact they were going up, and never decreasing. Even if I killed our code and uninstalled the extensions. (They're increasing at about ~4mbufs/sec.) Today I confirmed that this only happens if we include UDP packets: let udpRule = NENetworkRule(destinationNetwork: host, prefix: 0, protocol: .UDP) let tcpRule = NENetworkRule(destinationNetwork: host, prefix: 0, protocol: .TCP) ... settings.includedNetworkRules = [udpRule, tcpRule] If I comment out that udpRule, part, mbufs don't leak. Our handleNewUDPFlow(:, initialRemoteEndpoint:) method checks to see if the application is a friendly one, and if so it returns false. If it isn't friendly, we want to block QUIC packets: if let host = endpoint as? NWHostEndpoint { if host.port == "80" || host.port == "443" { // We need to open it and then close it flow.open(withLocalEndpoint: nil) { error in Self.workQueue.asyncAfter(deadline: .now() + 0.01) { let err = error ?? POSIXError(POSIXErrorCode.ECONNABORTED) flow.closeReadWithError(err) flow.closeWriteWithError(err) } } return true } } return false Has anyone else run into this? I can't see that it's my problem at that point, since the only thing we do with UDP flows is to either say "we don't want it, you handle it" or "ok sure, we'll take it but then let's close it immediately".
4
0
517
Aug ’24
Ever-increasing mbuf usage
Using our transparent proxy provider, I noticed that the mbuf usage was... weird: 15839/750028 mbufs in use: 15810 mbufs allocated to data 29 mbufs allocated to packet headers 734189 mbufs allocated to caches The amount allocated to caches does go down a bit, but nothing significantly. I started looking into this because I've had a couple of panics from remoted not checking in enough, and it was (as I recall, I can't find the crash logs now) mbuf-related. I've looked through an older version of the xnu source, and nothing jumped out, but that doesn't have the code for the network extension support. I hate mbufs and always have.
2
0
468
Aug ’24
DispatchIO crashes, tips on debugging?
BUG IN CLIENT OF LIBDISPATCH: Unexpected EV_VANISHED (do not destroy random mach ports or file descriptors) Which, ok, clear: somehow a file descriptor is being closed before DispatchIO.close() is called, yes? Only I can't figure out where it is being closed. I am currently using change_fdguard_np() to prevent closes anywhere else, and every single place where I call Darwin.close() is preceded by another call to change_fdguard_npand thenDispatchIO.close()`. eg self.unguardSocket() self.readDispatcher?.close() Darwin.close(self.socket) self.socket = -1 self.completion(self)
11
0
759
Jul ’24
NETransparentProxyProvider excludedRules limit?
I have this in my start code: for p in [4500] + Array(3478...3497) + Array(16384...16387) + Array(16393...16402) { // According to the documentation, I *should* be able to // use "" for the hostname, and prefix:0, but it complained // about the prefix length, so we use the top bit for ipv4 // and ipv6. let port = "\(p)" os_log(.debug, log: Self.log, "Setting up to exclude port %{public}s", port) let host_1 = NWHostEndpoint(hostname:"0.0.0.0", port: port) let host_2 = NWHostEndpoint(hostname:"255.0.0.0", port: port) let host_3 = NWHostEndpoint(hostname:"0::0", port: port) let host_4 = NWHostEndpoint(hostname:"ffff::0", port: port) for host in [host_1, host_3] { let udpPortRule = NENetworkRule(destinationNetwork: host, prefix:1, protocol: .UDP) excludeRules.append(udpPortRule) } } settings.excludedNetworkRules = excludeRules This produces the log message 2024-07-23 11:16:38.335649+0100 0x901984 Debug 0x0 20686 0 com.kithrup.SimpleTPP.Provider: [com.kithrup:Provider] Setting up to exclude port 3483 Later on, when running, I log the new flows in handleNewUDPFlow(:,initialRemoteEndpoint:), and it produces 2024-07-23 11:17:05.712055+0100 0x901984 Debug 0x0 20686 0 com.kithrup.SimpleTPP.Provider: [com.kithrup:Provider] handleNewUDPFlow(_:initialRemoteEndpoint:): new UDP flow for host 17.252.13.7:3483 app com.apple.identityservicesd So port 3483 is definitely in the excludedRules array, but it's not being excluded. (All of this is because I still can't figure out why FaceTime isn't working with us.)
5
0
528
Jul ’24
How do I do unit tests for code using system objects?
That's probably a bad title, let's try with specifics: we have a network extension, it has some classes / functions of its own, and they, when push comes to build, depend on (for example) NEAppProxyFlow and its subclasses. The code is written in Swift, since it is the language of the future. If I want to do a unit test for my code, I need to provide something that at least looks like NEAppProxyFlow, since I can't otherwise create one. I thought I could provide my own NetworkExtension module for test case, but that... did not work well, and I still don't understand why. On the other hand, I'm really bad at making unit tests, so the odds that I'm missing something fairly obvious to most other people are pretty high.
4
0
711
Jul ’24
Assertion failure during deinit due to... DispatchSourceTimer?
I have var idleScanTimer = DispatchSource.makeTimerSource() as a class ivar. When the object is started, I have self.idleScanTimer.schedule(deadline: .now(), repeating: Double(5.0*60)) (and it sets an event handler, that checks some times.) When the object is stopped, it calls self.idleScanTimer.cancel(). At some point, the object containing it is deallocated, and ... sometimes, I think, not always, it crashes: Crashed Thread: 61 Dispatch queue: NEFlow queue [...] Application Specific Information: BUG IN CLIENT OF LIBDISPATCH: Release of an inactive object [...] Thread 61 Crashed:: Dispatch queue: NEFlow queue 0 libdispatch.dylib 0x7ff81c1232cd _dispatch_queue_xref_dispose.cold.2 + 24 1 libdispatch.dylib 0x7ff81c0f84f6 _dispatch_queue_xref_dispose + 55 2 libdispatch.dylib 0x7ff81c0f2dec -[OS_dispatch_source _xref_dispose] + 17 3 com.kithrup.simpleprovider 0x101df5fa7 MyClass.deinit + 87 4 com.kithrup.simpleprovider 0x101dfbdbb MyClass.__deallocating_deinit + 11 5 libswiftCore.dylib 0x7ff829a63460 _swift_release_dealloc + 16 6 com.kithrup.simpleprovider 0x101e122f4 0x101de7000 + 176884 7 libswiftCore.dylib 0x7ff829a63460 _swift_release_dealloc + 16 8 libsystem_blocks.dylib 0x7ff81bfdc654 _Block_release + 130 9 libsystem_blocks.dylib 0x7ff81bfdc654 _Block_release + 130 10 libdispatch.dylib 0x7ff81c0f3317 _dispatch_client_callout + 8 11 libdispatch.dylib 0x7ff81c0f9317 _dispatch_lane_serial_drain + 672 12 libdispatch.dylib 0x7ff81c0f9dfd _dispatch_lane_invoke + 366 13 libdispatch.dylib 0x7ff81c103eee _dispatch_workloop_worker_thread + 753 14 libsystem_pthread.dylib 0x7ff81c2a7fd0 _pthread_wqthread + 326 15 libsystem_pthread.dylib 0x7ff81c2a6f57 start_wqthread + 15 I tried changing it to an optional and having the deinit call .cancel() and set it to nil, but it still crashes. I can't figure out how to get it deallocated in a small, standalone test program.
4
0
1.2k
Jul ’24
Transparent Proxy Providers and DNS
We have found a VPN that does not work while our TPP is running, and I have a hypothesis why, and it does not make any sense. It only fails when our TPP asks for UDP flows. Their VPN claims to fail at a DNS query, but it's getting EPIPE (this is Twingate for the curious). Looking at all the logs I can on the system, including dtruss and dtrace, I see that it does a sendto, and gets that errno. I can't, of course, determine more. By adding more logging, I can see that their VPN tunnel provider tries to open up a UDP flow to 8.8.8.8 port 53. First red flag: I did not think we were supposed to get DNS queries -- my guess is that only means for apps that use the system DNS libraries, implying (to me) that this VPN has their own DNS code. We look at the app name, and decide we don't care for it -- handleNewUDPFlow(_:initialEndpoint:) returns false/NO. I see this in the system logs: 2024-06-26 11:06:56.342680+0100 0x300c839 Default 0x0 40823 0 ${us}.Redirector: (NetworkExtension) [com.apple.networkextension:] [Extension ${us}]: provider rejected new flow UDP ${them}.macos.tunnelprovider[{length = 20, bytes = 0xca1b405e014154c2e38e20159d033f9b2d3eea18}] local port 0 interface en0(bound) which is all correct. But then the very next log entry is 2024-06-26 11:06:56.342717+0100 0x300cc14 Info 0x0 0 0 kernel: (399482302): received connect result 61 which, there you go, ECONNREFUSED which will be turned into EPIPE by sendto. (ETA: No, that's not what happens at all. I see other port 53 queries in my logs, and they follow the same, er, flow -- TPP refuses them, next log entry for the flow by the system is result 61.) There is no traffic to 8.8.8.8 over any of the interfaces. I have tried using a NENetworkRule that _excludes` port 53, but it does not allow that at all. I am very deeply confused by all of this, to the point I'm not quite sure how to begin to articulate a request for help. If anyone has any thoughts, comments, questions, commiserative howls of agony, I'd appreciate it.
1
1
548
Jun ’24
Transparent Proxy Providers and networking
Since we've had a lot of problems with XPC (bad design on my part, I'm sure), I tried changing the data communications between the TPP and the userland proxy to use sockets -- in this case (I've so many, many cases), I am trying to do an http proxy (so the TPP connects to, say, port 12345, sends CONNECT ${host}:${port} HTTP/1.0 X-Proxy-Host: ${host}:${port} It then reads a response, looking for a 200. So that part works -- once I added the networking client entitlement, I could connect and write that and read the response. Now we are cooking with gas, right? The application doing the connection (eg, curl) then sends the normal HTTP request, the TPP gets it, it writes it to the socket it created, the write succeeds (that is, returns the number of bytes in the request), and... it doesn't show up on the interface. (Using tcpdump -i lo0 -s 0 -vvvvvvvvvvvvvvvvvvv -A port 12345.) Since it doesn't show up on the interface, the user-land proxy doesn't get it, and things are very confused for everyone. If the connect() failed, I'd say, ah yes, sandboxed to heck and back, even with the entitlement can't do it. Or if the first write() or read() failed. But they don't fail, and the first round works. If the second write() failed, I could see that. But it both succeeds and doesn't succeed, and quantum confuses the heck out of me.
9
0
769
Jun ’24
Testing a Proxy Provider?
I'm mostly thinking of a Transparent Proxy Provider, as usual, but... how does one test it? I can't see how one would do it with unit tests (although you could break out code and test some of that code). Since it requires MDM or user approval, that makes automated tests a bit difficult. I have this monstrous vision of writing a program that loads the extension and invokes the appropriate methods on it but that just leads to other questions about subclasses. I'm sure other people have thought about this and am curious what the thoughts are. 😄
4
0
537
Jun ’24
ASWebAuthenticationSession and error code 1
We're using this (on a mac) to do 3rd party authentication. The completion handler is getting Authentication session got error: [The operation couldn’t be completed. (com.apple.AuthenticationServices.WebAuthenticationSession error 1.)], in domain: [com.apple.AuthenticationServices.WebAuthenticationSession] That seems to be generated if the auth window is closed. However... it's not being closed, so we end up spawning a second one to do it, and this one seems to work.
2
0
1.2k
Jun ’24
FileHandle over XPC failure?
2024-06-04 15:17:59.618853+0100 ProxyAgent[20233:29237510] [xpc.exceptions] <NSXPCConnection: 0x60000331cb40> connection from pid 20227 on anonymousListener or serviceListener: Exception caught during decoding of received selector newFlowWithIdentifier:to:type:metadata:socket:, dropping incoming message. Exception: Exception while decoding argument 4 (#6 of invocation): <NSInvocation: 0x600001778780> return value: {v} void target: {@} 0x0 selector: {:} null argument 2: {@} 0x6000017787c0 argument 3: {@} 0x60000002d170 argument 4: {q} 1 argument 5: {@} 0x600001746600 argument 6: {@} 0x0 Exception: decodeObjectForKey: Object of class "NSFileHandle" returned nil from -initWithCoder: while being decoded for key <no key> The extension is in Swift; the recipient is in ObjC (wheeeeee). Based on the extension's logging, the FileHandle is not nil. I am trying to pass a FileHandle based on a socketpair up to the user-land code. The sockets are created happily. Any ideas what's going wrong here?
5
0
662
Jun ’24
Transparent Proxy Provider (again) and IPSec: should it work?
As I've mentioned multiple times, we've discovered some very annoying failures when using a TPP, including FaceTime, AirDrop, and some VPNs. (Tailscale works fine, weirdly enough.) In doing some experimentation today with FortiNet, I was able to get the TPP to work if I added the FortiNet server (which, in our case, is an amazon VM) to the TPP's excludedNetworks list. While it is not working, the tcpdump I got for the host was: 15:15:35.584029 IP (tos 0x0, ttl 64, id 1976, offset 0, flags [none], proto UDP (17), length 412) 192.168.43.16.55067 > ${hidden}.ipsec-msft: [udp sum ok] NONESP-encap: isakmp 1.0 msgid 00000000 cookie d66f571dcfc483ba->0000000000000000: phase 1 I ident: (sa: doi=ipsec situation=identity (p: #1 protoid=isakmp transform=2 (t: #1 id=ike (type=lifetype value=sec)(type=lifeduration len=4 value=00015180)(type=enc value=aes)(type=keylen value=0080)(type=auth value=fde9)(type=hash value=sha1)(type=group desc value=modp2048)) (t: #2 id=ike (type=lifetype value=sec)(type=lifeduration len=4 value=00015180)(type=enc value=aes)(type=keylen value=0100)(type=auth value=fde9)(type=hash value=sha2-256)(type=group desc value=modp2048)))) (vid: len=16 4a131c81070358455c5728f20e95452f) (vid: len=16 8f8d83826d246b6fc7a8a6a428c11de8) (vid: len=16 439b59f8ba676c4c7737ae22eab8f582) (vid: len=16 4d1e0e136deafa34c4f3ea9f02ec7285) (vid: len=16 80d0bb3def54565ee84645d4c85ce3ee) (vid: len=16 7d9419a65310ca6f2c179d9215529d56) (vid: len=16 cd60464335df21f87cfdb2fc68b6a448) (vid: len=16 90cb80913ebb696e086381b5ec427b1f) (vid: len=16 4c53427b6d465d1b337bb755a37a7fef) (vid: len=16 b4f01ca951e9da8d0bafbbd34ad3044e) (vid: len=8 09002689dfd6b712) (vid: len=16 12f5f28c457168a9702d9fe274cc0100) (vid: len=16 afcad71368a1f1c96b8696fc77570100) E.......@.....+.6.8c......6......oW........................|...d...........X.......(..............Q........................(..............Q.........................J.....XE\W(...E/........m$ko....(.......C.Y..glLw7."........M...m..4......r........=.TV^.FE..\......}...S..o,....R.V.....`FC5.!.|...h..H........>.in.c...B{.....LSB{mF].3{.U.z..........Q.......J..N.... .&.............Eqh.p-..t...........h...k...wW.. 15:15:35.901666 IP (tos 0x0, ttl 46, id 23154, offset 0, flags [none], proto UDP (17), length 272) ${hidden}.ipsec-msft > 192.168.43.16.55067: [udp sum ok] NONESP-encap: isakmp 1.0 msgid 00000000 cookie d66f571dcfc483ba->d1ec3b9d2f311bf5: phase 1 R ident: (sa: doi=ipsec situation=identity (p: #1 protoid=isakmp transform=1 (t: #1 id=ike (type=lifetype value=sec)(type=lifeduration len=4 value=00015180)(type=enc value=aes)(type=keylen value=0080)(type=auth value=fde9)(type=hash value=sha1)(type=group desc value=modp2048)))) (vid: len=16 4a131c81070358455c5728f20e95452f) (vid: len=16 afcad71368a1f1c96b8696fc77570100) (vid: len=8 09002689dfd6b712) (vid: len=16 12f5f28c457168a9702d9fe274cc0204) (vid: len=16 4c53427b6d465d1b337bb755a37a7fef) (vid: len=16 8299031757a36082c6a621de00000000) (vid: len=16 9b15e65a871aff342666623ba5022e60) (vid: len=16 ca4a4cbb12eab6c58c57067c2e653786) E...Zr......6.8c..+.......Z>.....oW.......;./1.................<...........0.......(..............Q.........................J.....XE\W(...E/........h...k...wW...... .&.............Eqh.p-..t.......LSB{mF].3{.U.z..........W.`...!............Z...4&fb;...`.....JL......W.|.e7. 15:15:35.901756 IP (tos 0x0, ttl 64, id 41586, offset 0, flags [none], proto ICMP (1), length 56) 192.168.43.16 > ${hidden}: ICMP 192.168.43.16 udp port 55067 unreachable, length 36 IP (tos 0x0, ttl 46, id 23154, offset 0, flags [none], proto UDP (17), length 272) ${hidden}.ipsec-msft > 192.168.43.16.55067: [no cksum] [|isakmp_rfc3948] `.....<"..:...E..8.r..@.}q..+.6.8c...Q....E...Zr......6.8c..+......... 15:15:38.904628 IP (tos 0x0, ttl 46, id 23155, offset 0, flags [none], proto UDP (17), length 272) ${hidden}.ipsec-msft > 192.168.43.16.55067: [udp sum ok] NONESP-encap: isakmp 1.0 msgid 00000000 cookie d66f571dcfc483ba->d1ec3b9d2f311bf5: phase 1 R ident: (sa: doi=ipsec situation=identity (p: #1 protoid=isakmp transform=1 (t: #1 id=ike (type=lifetype value=sec)(type=lifeduration len=4 value=00015180)(type=enc value=aes)(type=keylen value=0080)(type=auth value=fde9)(type=hash value=sha1)(type=group desc value=modp2048)))) (vid: len=16 4a131c81070358455c5728f20e95452f) (vid: len=16 afcad71368a1f1c96b8696fc77570100) (vid: len=8 09002689dfd6b712) (vid: len=16 12f5f28c457168a9702d9fe274cc0204) (vid: len=16 4c53427b6d465d1b337bb755a37a7fef) (vid: len=16 8299031757a36082c6a621de00000000) (vid: len=16 9b15e65a871aff342666623ba5022e60) (vid: len=16 ca4a4cbb12eab6c58c57067c2e653786) E...Zs......6.8c..+.......Z>.....oW.......;./1.................<...........0.......(..............Q.........................J.....XE\W(...E/........h...k...wW...... .&.............Eqh.p-..t.......LSB{mF].3{.U.z..........W.`...!............Z...4&fb;...`.....JL......W.|.e7. 15:15:38.904763 IP (tos 0x0, ttl 64, id 8956, offset 0, flags [none], proto ICMP (1), length 56) 192.168.43.16 > ${hidden}: ICMP 192.168.43.16 udp port 55067 unreachable, length 36 IP (tos 0x0, ttl 46, id 23155, offset 0, flags [none], proto UDP (17), length 272) ${hidden}.ipsec-msft > 192.168.43.16.55067: [no cksum] [|isakmp_rfc3948] `.....<"..:...E..8"...@.....+.6.8c...Q....E...Zs......6.8c..+......... So, given that, I tried adding let msftIPSecHost = NWHostEndpoint(hostname: "", port: "4500") let msftIPSecRule = NENetworkRule(destinationNetwork: msftIPSecHost, prefix: 0, protocol: .any) settings.excludedNetworkRules = [msftIPSecRule] and... it worked. At least, the fortinet client worked, and AirDrop transmission worked. Note that I never saw the flows for port 4500 in handleNewUDPFlow(:initialRemoteEndpoint:) -- just having a UDP rule that would intercept them seems to have caused it to fail. Anyone encountered this, or have an explanation? (I am now trying it in our actual product to see how it works.)
3
0
468
May ’24