Since we've had a lot of problems with XPC (bad design on my part, I'm sure), I tried changing the data communications between the TPP and the userland proxy to use sockets -- in this case (I've so many, many cases), I am trying to do an http proxy (so the TPP connects to, say, port 12345, sends
CONNECT ${host}:${port} HTTP/1.0
X-Proxy-Host: ${host}:${port}
It then reads a response, looking for a 200.
So that part works -- once I added the networking client entitlement, I could connect and write that and read the response. Now we are cooking with gas, right?
The application doing the connection (eg, curl) then sends the normal HTTP request, the TPP gets it, it writes it to the socket it created, the write succeeds (that is, returns the number of bytes in the request), and...
it doesn't show up on the interface. (Using tcpdump -i lo0 -s 0 -vvvvvvvvvvvvvvvvvvv -A port 12345.) Since it doesn't show up on the interface, the user-land proxy doesn't get it, and things are very confused for everyone.
If the connect() failed, I'd say, ah yes, sandboxed to heck and back, even with the entitlement can't do it. Or if the first write() or read() failed. But they don't fail, and the first round works. If the second write() failed, I could see that.
But it both succeeds and doesn't succeed, and quantum confuses the heck out of me.
Post
Replies
Boosts
Views
Activity
I'm mostly thinking of a Transparent Proxy Provider, as usual, but... how does one test it? I can't see how one would do it with unit tests (although you could break out code and test some of that code). Since it requires MDM or user approval, that makes automated tests a bit difficult. I have this monstrous vision of writing a program that loads the extension and invokes the appropriate methods on it but that just leads to other questions about subclasses.
I'm sure other people have thought about this and am curious what the thoughts are. 😄
We're using this (on a mac) to do 3rd party authentication. The completion handler is getting
Authentication session got error: [The operation couldn’t be completed. (com.apple.AuthenticationServices.WebAuthenticationSession error 1.)], in domain: [com.apple.AuthenticationServices.WebAuthenticationSession]
That seems to be generated if the auth window is closed. However... it's not being closed, so we end up spawning a second one to do it, and this one seems to work.
2024-06-04 15:17:59.618853+0100 ProxyAgent[20233:29237510] [xpc.exceptions] <NSXPCConnection: 0x60000331cb40> connection from pid 20227 on anonymousListener or serviceListener: Exception caught during decoding of received selector newFlowWithIdentifier:to:type:metadata:socket:, dropping incoming message.
Exception: Exception while decoding argument 4 (#6 of invocation):
<NSInvocation: 0x600001778780>
return value: {v} void
target: {@} 0x0
selector: {:} null
argument 2: {@} 0x6000017787c0
argument 3: {@} 0x60000002d170
argument 4: {q} 1
argument 5: {@} 0x600001746600
argument 6: {@} 0x0
Exception: decodeObjectForKey: Object of class "NSFileHandle" returned nil from -initWithCoder: while being decoded for key <no key>
The extension is in Swift; the recipient is in ObjC (wheeeeee).
Based on the extension's logging, the FileHandle is not nil.
I am trying to pass a FileHandle based on a socketpair up to the user-land code. The sockets are created happily.
Any ideas what's going wrong here?
As I've mentioned multiple times, we've discovered some very annoying failures when using a TPP, including FaceTime, AirDrop, and some VPNs. (Tailscale works fine, weirdly enough.) In doing some experimentation today with FortiNet, I was able to get the TPP to work if I added the FortiNet server (which, in our case, is an amazon VM) to the TPP's excludedNetworks list.
While it is not working, the tcpdump I got for the host was:
15:15:35.584029 IP (tos 0x0, ttl 64, id 1976, offset 0, flags [none], proto UDP (17), length 412)
192.168.43.16.55067 > ${hidden}.ipsec-msft: [udp sum ok] NONESP-encap: isakmp 1.0 msgid 00000000 cookie d66f571dcfc483ba->0000000000000000: phase 1 I ident:
(sa: doi=ipsec situation=identity
(p: #1 protoid=isakmp transform=2
(t: #1 id=ike (type=lifetype value=sec)(type=lifeduration len=4 value=00015180)(type=enc value=aes)(type=keylen value=0080)(type=auth value=fde9)(type=hash value=sha1)(type=group desc value=modp2048))
(t: #2 id=ike (type=lifetype value=sec)(type=lifeduration len=4 value=00015180)(type=enc value=aes)(type=keylen value=0100)(type=auth value=fde9)(type=hash value=sha2-256)(type=group desc value=modp2048))))
(vid: len=16 4a131c81070358455c5728f20e95452f)
(vid: len=16 8f8d83826d246b6fc7a8a6a428c11de8)
(vid: len=16 439b59f8ba676c4c7737ae22eab8f582)
(vid: len=16 4d1e0e136deafa34c4f3ea9f02ec7285)
(vid: len=16 80d0bb3def54565ee84645d4c85ce3ee)
(vid: len=16 7d9419a65310ca6f2c179d9215529d56)
(vid: len=16 cd60464335df21f87cfdb2fc68b6a448)
(vid: len=16 90cb80913ebb696e086381b5ec427b1f)
(vid: len=16 4c53427b6d465d1b337bb755a37a7fef)
(vid: len=16 b4f01ca951e9da8d0bafbbd34ad3044e)
(vid: len=8 09002689dfd6b712)
(vid: len=16 12f5f28c457168a9702d9fe274cc0100)
(vid: len=16 afcad71368a1f1c96b8696fc77570100)
E.......@.....+.6.8c......6......oW........................|...d...........X.......(..............Q........................(..............Q.........................J.....XE\W(...E/........m$ko....(.......C.Y..glLw7."........M...m..4......r........=.TV^.FE..\......}...S..o,....R.V.....`FC5.!.|...h..H........>.in.c...B{.....LSB{mF].3{.U.z..........Q.......J..N.... .&.............Eqh.p-..t...........h...k...wW..
15:15:35.901666 IP (tos 0x0, ttl 46, id 23154, offset 0, flags [none], proto UDP (17), length 272)
${hidden}.ipsec-msft > 192.168.43.16.55067: [udp sum ok] NONESP-encap: isakmp 1.0 msgid 00000000 cookie d66f571dcfc483ba->d1ec3b9d2f311bf5: phase 1 R ident:
(sa: doi=ipsec situation=identity
(p: #1 protoid=isakmp transform=1
(t: #1 id=ike (type=lifetype value=sec)(type=lifeduration len=4 value=00015180)(type=enc value=aes)(type=keylen value=0080)(type=auth value=fde9)(type=hash value=sha1)(type=group desc value=modp2048))))
(vid: len=16 4a131c81070358455c5728f20e95452f)
(vid: len=16 afcad71368a1f1c96b8696fc77570100)
(vid: len=8 09002689dfd6b712)
(vid: len=16 12f5f28c457168a9702d9fe274cc0204)
(vid: len=16 4c53427b6d465d1b337bb755a37a7fef)
(vid: len=16 8299031757a36082c6a621de00000000)
(vid: len=16 9b15e65a871aff342666623ba5022e60)
(vid: len=16 ca4a4cbb12eab6c58c57067c2e653786)
E...Zr......6.8c..+.......Z>.....oW.......;./1.................<...........0.......(..............Q.........................J.....XE\W(...E/........h...k...wW...... .&.............Eqh.p-..t.......LSB{mF].3{.U.z..........W.`...!............Z...4&fb;...`.....JL......W.|.e7.
15:15:35.901756 IP (tos 0x0, ttl 64, id 41586, offset 0, flags [none], proto ICMP (1), length 56)
192.168.43.16 > ${hidden}: ICMP 192.168.43.16 udp port 55067 unreachable, length 36
IP (tos 0x0, ttl 46, id 23154, offset 0, flags [none], proto UDP (17), length 272)
${hidden}.ipsec-msft > 192.168.43.16.55067: [no cksum] [|isakmp_rfc3948]
`.....<"..:...E..8.r..@.}q..+.6.8c...Q....E...Zr......6.8c..+.........
15:15:38.904628 IP (tos 0x0, ttl 46, id 23155, offset 0, flags [none], proto UDP (17), length 272)
${hidden}.ipsec-msft > 192.168.43.16.55067: [udp sum ok] NONESP-encap: isakmp 1.0 msgid 00000000 cookie d66f571dcfc483ba->d1ec3b9d2f311bf5: phase 1 R ident:
(sa: doi=ipsec situation=identity
(p: #1 protoid=isakmp transform=1
(t: #1 id=ike (type=lifetype value=sec)(type=lifeduration len=4 value=00015180)(type=enc value=aes)(type=keylen value=0080)(type=auth value=fde9)(type=hash value=sha1)(type=group desc value=modp2048))))
(vid: len=16 4a131c81070358455c5728f20e95452f)
(vid: len=16 afcad71368a1f1c96b8696fc77570100)
(vid: len=8 09002689dfd6b712)
(vid: len=16 12f5f28c457168a9702d9fe274cc0204)
(vid: len=16 4c53427b6d465d1b337bb755a37a7fef)
(vid: len=16 8299031757a36082c6a621de00000000)
(vid: len=16 9b15e65a871aff342666623ba5022e60)
(vid: len=16 ca4a4cbb12eab6c58c57067c2e653786)
E...Zs......6.8c..+.......Z>.....oW.......;./1.................<...........0.......(..............Q.........................J.....XE\W(...E/........h...k...wW...... .&.............Eqh.p-..t.......LSB{mF].3{.U.z..........W.`...!............Z...4&fb;...`.....JL......W.|.e7.
15:15:38.904763 IP (tos 0x0, ttl 64, id 8956, offset 0, flags [none], proto ICMP (1), length 56)
192.168.43.16 > ${hidden}: ICMP 192.168.43.16 udp port 55067 unreachable, length 36
IP (tos 0x0, ttl 46, id 23155, offset 0, flags [none], proto UDP (17), length 272)
${hidden}.ipsec-msft > 192.168.43.16.55067: [no cksum] [|isakmp_rfc3948]
`.....<"..:...E..8"...@.....+.6.8c...Q....E...Zs......6.8c..+.........
So, given that, I tried adding
let msftIPSecHost = NWHostEndpoint(hostname: "", port: "4500")
let msftIPSecRule = NENetworkRule(destinationNetwork: msftIPSecHost, prefix: 0, protocol: .any)
settings.excludedNetworkRules = [msftIPSecRule]
and... it worked. At least, the fortinet client worked, and AirDrop transmission worked.
Note that I never saw the flows for port 4500 in handleNewUDPFlow(:initialRemoteEndpoint:) -- just having a UDP rule that would intercept them seems to have caused it to fail.
Anyone encountered this, or have an explanation? (I am now trying it in our actual product to see how it works.)
I've come to the conclusion that TPP and UDP are just utterly wonky together.
This is my relevant code:
let host = NWHostEndpoint(hostname: "", port: "0")
let udpRule = NENetworkRule(destinationNetwork: host, prefix: 0, protocol: .UDP)
let tcpRule = NENetworkRule(destinationNetwork: host, prefix: 0, protocol: .TCP)
let settings = NETransparentProxyNetworkSettings(tunnelRemoteAddress:"127.0.0.1")
/*
* These three lines are a hack and experiment
*/
let quicHost_1 = NWHostEndpoint(hostname: "", port: "80")
let quicHost_2 = NWHostEndpoint(hostname: "", port: "443")
let quicRule_1 = NENetworkRule(destinationNetwork: quicHost_1, prefix: 0, protocol: .UDP)
let quicRule_2 = NENetworkRule(destinationNetwork: quicHost_2, prefix: 0, protocol: .UDP)
settings.includedNetworkRules = [quicRule_1, quicRule_2, tcpRule]
settings.excludedNetworkRules = nil
Directing UDP through a TPP breaks FaceTime, AirDrop, and a bunch of VPNs
Despite the documentation implication that you can't do DNS control with a TPP ("A port string of 53 is not allowed. Use Destination Domain-based rules to match DNS traffic."), if I opt into UDP (settings.includedNetworkRules = [udpRule, tcpRule]), then I see traffic to port 53, and can do things with it.
If I use a wild-card network rule (the code above), then the TPP does not seem to get any UDP flows at all.
If I use a wild-card exclusion rule (using NWHostEndpoint(hostname: "", port: "53")), then everything starts breaking.
If I use NENetworkRule(destinationHost: host, protocol: .UDP), it complains because the prefix must be 32 or less.
I've filed feedbacks, and engaged with eskimo (really, thank you), and looked at previous threads, so mostly this is begging: has anyone gotten this to work as expected? I no longer think I'm being obviously wrong with my code, but I would be super delighted to find out I've missed some tricks or angles.
I ran it (Leaks) on a process for about 2 hours. It collected 68gytes of data. It cannot open the folder -- can't find a file (which is there as a .zip archive) or if I expand it, just an error about missing an index.
Filing a bug about this is difficult, since it's 68gbyets of data.
Because it may be quicker to ask: with a TPP, readData() gets a data size of 0 if the process has finished writing to the network. However, there seems to be no way to find out if it has finished reading from the network, other than to do a .write() and see if you get an error. (I filed a FB about this, for whatever that's worth.)
Since the API is flow-based, not socket, it's not possible to tell if the app has set its own timeout. Or exited. So one question I have is: if I do flow.write(Data(count:0)) -- is that a possible way to determine if it's still around? Or will it be interpreted as read(2) returning 0?
(Putting this in for testing is difficult, but not impossible -- as I said, this might be the quickest way to find out.)
On macOS, that is. The goals are largely for testing, where we'd like to know the maximum and minimum memory our processes are using, but we'd also like to know it on crash.
Our current method is to use ps periodically and grab the appropriate field, but is there a better way? (I looked at MetricKit, but it's not as useful on macOS; I filed FB13640765 "MetricKit would be awesome with more mac features" a couple of months ago.)
This one is sorta behaving similar to the FaceTime / AirDrop issue, but it does depend on order, which makes me wonder if it's a programming choice. Specifically, using FortiNet's VPN client, using IPSec, if I have a TPP installed and then try to connect it, it fails. If, however, I connect and then start the TPP, it succeeds, which at least makes it better than FaceTime and AirDrop.
So my question here is... hm, not as well-articulated as I would like. I'm curious if a VPN can check to see if other VPNs are installed and configured, and if so say "nope." Hm, saying that more clearly: I think it's possible for a network extension to check the interface that a packet/flow is going to, and cause a failure of some sort if it's a VPN, correct? Does anyone do that? Or am I seeing lions in the waterhole weeds?
I'm also curious if Apple's networking code has issues with multiple VPNs. (Although, I will note, our TPP works just fine with Tailscale, so it's not an inherent conflict. Also Cisco AnyConnect. So maybe it's just IPSec?)
ETA: to make it clear, my test case involves using a ****** TPP, where handleNewUDPFlow and handleNewFlow both immediately return false, meaning that the system should behave as if it's not there, and yet... doesn't.
I appreciate any comments/assistance/guffaws.
Even when it is disabled (that is, our app says "don't do anything" and all it does is start logging things).
On the mac, when I try to make an outgoing audio-only call (it's a mac mini with no camera), it seems to connect as far as the outside is concerned, but nothing happens -- I get a request on my other devices, with the wrong account, and the mac mini says it's failed while the ipad or iphone keep connected.
I am logging everything I can think of in our extensions, and they don't seem to show anything of interest. And I can't figure out what to look for in the entirety of system logs. I do see Messages dropped during live streaming (use log show to see what they were)... but I'm not sure what to look for in the log show.
If I try to make a call in, it results in what seems to be an iOS FaceTime bug -- the phone tells me to log into FaceTime. Even though I am logged in.
I have this code in a network extension:
private func pathForToken(token: audit_token_t) -> String? {
var tokenCopy = token
let bufferSize = UInt32(4096)
let bytes = UnsafeMutablePointer<UInt8>.allocate(capacity: Int(bufferSize))
let length = proc_pidpath_audittoken(&tokenCopy, bytes, bufferSize)
if length != 0 {
return String(cString: bytes).lowercased()
}
return nil
}
bytes appears to be leaked -- the call stack is pathForToken(token:) to specialized static UnsafeMutablePointer.allocate(capacity:)
Do I need to do something to ensure bytes is released, since it doesn't seem to be happening on its own?
Two different crash patterns -- one an abort, the other complaining about a lock being corrupt or owning thread having exited. The first one is:
Thread 1 Crashed:: Dispatch queue: com.apple.root.default-qos.overcommit
0 libsystem_platform.dylib 0x18fc10244 _os_unfair_lock_corruption_abort + 88
1 libsystem_platform.dylib 0x18fc0b788 _os_unfair_lock_lock_slow + 332
2 libobjc.A.dylib 0x18f820c90 objc_sync_enter + 20
3 com.kithrup.TPProvider 0x100d2eee0 closure #3 in TPProvider.startProxy(options:completionHandler:) + 340
4 com.kithrup.TPProvider 0x100d2d980 thunk for @escaping @callee_guaranteed () -> () + 28
5 libdispatch.dylib 0x18fa31910 _dispatch_client_callout + 20
6 libdispatch.dylib 0x18fa34dc8 _dispatch_continuation_pop + 600
7 libdispatch.dylib 0x18fa48be4 _dispatch_source_latch_and_call + 420
8 libdispatch.dylib 0x18fa477b4 _dispatch_source_invoke + 832
9 libdispatch.dylib 0x18fa431f4 _dispatch_root_queue_drain + 392
10 libdispatch.dylib 0x18fa43a04 _dispatch_worker_thread2 + 156
11 libsystem_pthread.dylib 0x18fbdb0d8 _pthread_wqthread + 228
12 libsystem_pthread.dylib 0x18fbd9e30 start_wqthread + 8
while the other one is:
Application Specific Information:
BUG IN CLIENT OF LIBPLATFORM: os_unfair_lock is corrupt, or owner thread exited without unlocking
Abort Cause 198194
Thread 1 Crashed:: Dispatch queue: com.apple.root.default-qos.overcommit
0 libsystem_platform.dylib 0x18fc10220 _os_unfair_lock_corruption_abort + 52
1 libsystem_platform.dylib 0x18fc0b788 _os_unfair_lock_lock_slow + 332
2 libobjc.A.dylib 0x18f820c90 objc_sync_enter + 20
3 com.kithrup.TPProvider 0x104e86ee0 closure #3 in TPProvider.startProxy(options:completionHandler:) +340
4 com.kithrup.TPProvider 0x104e85980 thunk for @escaping @callee_guaranteed () -> () + 28
5 libdispatch.dylib 0x18fa31910 _dispatch_client_callout + 20
6 libdispatch.dylib 0x18fa34dc8 _dispatch_continuation_pop + 600
7 libdispatch.dylib 0x18fa48be4 _dispatch_source_latch_and_call + 420
8 libdispatch.dylib 0x18fa477b4 _dispatch_source_invoke + 832
9 libdispatch.dylib 0x18fa431f4 _dispatch_root_queue_drain + 392
10 libdispatch.dylib 0x18fa43a04 _dispatch_worker_thread2 + 156
11 libsystem_pthread.dylib 0x18fbdb0d8 _pthread_wqthread + 228
12 libsystem_pthread.dylib 0x18fbd9e30 start_wqthread + 8
Our TPProvider, whenever it uses a dispatch queue, uses a custom one, so these are presumably system queues and locks. My best guess would be some XPC command took too long? But that's just WAG.
Any ideas about what is actually going on?
We added a packet filter to our app, then found a way to not need it, so we want to be able to remove it on upgrades. But we don't want to install it if it's not already installed. Simple, right?
The basic flow of the code is, on start-up, it does a propertiesRequestForExtensiion request. The method for the delegate goes through the various versions, ignoring any that are property.isEnabled == NO. When it comes to one that is enabled, it checks the version -- if it's the same version as the running app, it goes to deactivate it. If it's a different version, it goes to enable the current version (creating a activationRequestForExtensiion request).
This should all be very simple. Except.
At some point during this, the properties request gets a failure -- Domain=OSSystemExtensionErrorDomain Code=1. Ok, it seems there are lots of them laying around (I haven't rebooted in a while), and that method doesn't return once it finds one that is enabled. So maybe it doesn't like that.
And then the activation request that was submitted also fails, also with the same error that doesn't explain anything.
I thought, ok, maybe they don't like to stop on each other's toes, so let's create a serial dispatch queue, and have all of the system extension requests use that queue. That way, the activation request won't begin until the properties request has finished!
Only I did that. And it did get a bit further -- the request method was invoked! Only then I still got messages about the properties and activation requests failing with the same unknown error.
So then I looked at console. And sysextd is crashing, every time this happens. And then I dump all of the logs around that time, and look through them, and see... nothing.
I had hoped to end this with a description of how I achieved victory, but instead... I'm going to have to reboot and see if that solves the mysterious crashing of sysextd.
The profiles command shows them, but the Store file/directory is blocked off from access (which, I suppose, kinda makes sense).
(We are in the process of getting customers to upgrade the profile, and if I can see whether our profile has an entry, then I can behave differently.)