Most this particular session has been in order to come up with a way to have tests for our transparent proxy provider that can be run without using the main extension -- so I rolled the networking code out, wrote a mimic of NEAppProxyFlow, and then wrote tests for it. It uses socket I/O to communicate with each side ("app" and "TPP"). Internally, it uses DispachIO to handle data sent from the "app", while it exposes the "TPP" side as a FileHandle (and it does that so that I could, if I needed to, pass it over XPC).
This isn't the only way I could have implemented it, admittedly, but it grew out of my first approach, which was to use kqueue.
(Oh, and I got the FileHandle issue solved -- the class behaves super badly if it gets closed more than once, so I isolated that code into a function, turned the object into an optional, and set it to nil after closing it.)
Since I don't think I'll end up using XPC to pass the FileHandle around, I can probably turn it all into DispatchIO. I hadn't primarily because I kept getting crashes, and/or leaked file descriptors, but now that I can get the DispatchIO part working without crashes, it's time to revisit it.
Post
Replies
Boosts
Views
Activity
Ok, using that information, I was able to reproduce it in a standalone program, which helped me track it down. (And then I had to deal with the cleanupHandler not being called, but I found that out and got it done.)
Now I'm sometimes getting a crash:
fileHandle?.readabilityHandler = { fh in
let data = try fh.availableData
print("Got \(data.count) bytes")
}
*** Terminating app due to uncaught exception 'NSFileHandleOperationException', reason: '*** -[NSConcreteFileHandle availableData]: Bad file descriptor'
This only happens sometimes, in my tests.
Ok, I am going to say that is horribly unclear then -- I read it completely differently, and "the file descriptor should not be closed by the application until the cleanup handler is called" would have made all the difference... 😄
And now I can't get it to call the cleanup handler at all, sigh.
The basic flow of the code is:
Create a socketpair
Create a DispatchIO using socket[0]
Create a FileHandle using socket[1]
Do work
When done, DispatchIO.close() and FileHandle.close()
Also close the sockets using Darwin.close().
CRASH!
The problematical one seems to be the file descriptor for the DispatchIO.
So, yay, I can reproduce it in my test code. And I used fdguard (with CLOSE not DUP), and never unguard the file descriptors, and it still gets the EV_VANISH crash.
The crashed thread is
Thread 5 Crashed:
0 libdispatch.dylib 0x7ff80af41e5f _dispatch_source_merge_evt.cold.1 + 24
1 libdispatch.dylib 0x7ff80af25bb1 _dispatch_source_merge_evt + 149
2 libdispatch.dylib 0x7ff80af2f9b6 _dispatch_event_loop_merge + 112
3 libdispatch.dylib 0x7ff80af21db3 _dispatch_workloop_worker_thread + 438
4 libsystem_pthread.dylib 0x7ff80b0c5fd0 _pthread_wqthread + 326
5 libsystem_pthread.dylib 0x7ff80b0c4f57 start_wqthread + 15
I've attached a trimmed & edited copy of the full crash log (but not the entire .ips file).
crash.txt
FB14453214, alas. I'm going to try my minimal TPP as well.
Here is the (somewhat edited) sequence of system logs for a flow. I pared down the excludedRules a bit, and added:
let appleRule = NENetworkRule(destinationNetwork: NWHostEndpoint(hostname: "17.0.0.0", port: "0"), prefix: 8, protocol: .any)
so nothing in 17/8 should have come to the provider, right? And yet it did. I must be doing something very very wrong, but I can't figure it out yet.
com.kithrup.SimpleTPP.Provider: (libnetworkextension.dylib) [com.apple.networkextension:] (0): Flow 1864790775 is connecting
com.kithrup.SimpleTPP.Provider: (libnetworkextension.dylib) [com.apple.networkextension:] (1864790775): New flow: NEFlow type = datagram, app = com.apple.identityservicesd, name = , 192.168.43.105:16393 <-> 17.252.14.81:3478, filter_id = , interface = en0(bound)
com.kithrup.SimpleTPP.Provider: (NetworkExtension) [com.apple.networkextension:] [Extension com.kithrup.SimpleTPP.Provider]: Calling handleNewUDPFlow with UDP com.apple.identityservicesd[{length = 20, bytes = 0x9369f2790daea880d6d0dca66519aa9ae04f7bcc}] local port 16393 interface en0(bound), remoteEndpoint = 17.252.14.81:3478
com.kithrup.SimpleTPP.Provider: [com.kithrup.SimpleTPP.Provider:redirector] handleNewUDPFlow(_:initialRemoteEndpoint:): new UDP flow for host 17.252.14.81:3478 app com.apple.identityservicesd
com.kithrup.SimpleTPP.Provider: [com.kithrup.SimpleTPP.Provider:redirector] Got flow for com.apple.identityservicesd
identityservicesd: (libquic.dylib) [com.apple.network:quic] quic_crypto_new_flow [C382:2] [b3582e00-b4c3556dcba3be09] TLS stream is: [C383]
identityservicesd: (Network) [com.apple.network:connection] [C383 C99F226F-F5A6-49B2-9AE1-ACE580B7619A IPv4#c62ff9e2:3478 quic, tls, definite, attribution: developer, reuse local address, context: IDSRealTime (private), proc: 24A8D0C7-FB75-37DE-8065-5EB68A7790DE, effective proc: 7FD7A321-FDAF-3CF4-926E-BF555C540CE0, local address: 192.168.43.105:16393, required interface: en0(13), has demux] start
identityservicesd: (Network) [com.apple.network:connection] [C383 IPv4#c62ff9e2:3478 initial socket-flow (satisfied (Path is satisfied), interface: en0[802.11], scoped, ipv4, ipv6, dns, uses wifi)] event: path:start @0.000s
identityservicesd: (Network) [com.apple.network:connection] [C383 IPv4#c62ff9e2:3478 waiting socket-flow (satisfied (Path is satisfied), interface: en0[802.11], scoped, ipv4, ipv6, dns, uses wifi)] event: path:satisfied @0.000s, uuid: A457D7C0-8173-4B04-B722-747C0287E464
identityservicesd: (Network) [com.apple.network:connection] [C383 IPv4#c62ff9e2:3478 in_progress socket-flow (satisfied (Path is satisfied), interface: en0[802.11], scoped, ipv4, ipv6, dns, uses wifi)] event: flow:start_connect @0.000s
identityservicesd: (Network) [com.apple.network:connection] nw_connection_report_state_with_handler_on_nw_queue [C383] reporting state preparing
identityservicesd: (Network) [com.apple.network:connection] nw_flow_connected [C383 IPv4#c62ff9e2:3478 in_progress socket-flow (satisfied (Path is satisfied), interface: en0[802.11], scoped, ipv4, ipv6, dns, uses wifi)] Joined protocol connected (quic)
identityservicesd: (Network) [com.apple.network:connection] [C383 IPv4#c62ff9e2:3478 in_progress socket-flow (satisfied (Path is satisfied), interface: en0[802.11], scoped, ipv4, ipv6, dns, uses wifi)] event: flow:finish_transport @0.000s
com.kithrup.SimpleTPP.Provider: [com.kithrup.SimpleTPP.Provider:redirector] We are bypassing the app com.apple.identityservicesd (path /system/library/privateframeworks/ids.framework/identityservicesd.app/contents/macos/identityservicesd)!
com.kithrup.SimpleTPP.Provider: (NetworkExtension) [com.apple.networkextension:] [Extension com.kithrup.SimpleTPP.Provider]: provider rejected new flow UDP com.apple.identityservicesd[{length = 20, bytes = 0x9369f2790daea880d6d0dca66519aa9ae04f7bcc}] local port 16393 interface en0(bound)
identityservicesd: (libboringssl.dylib) [com.apple.network:boringssl] boringssl_session_apply_protocol_options_for_transport_block_invoke(2007) [C383:1][0x125f17eb0] TLS configured [min_version(0x0304) max_version(0x0304) name(redacted) tickets(true) false_start(true) enforce_ev(false) enforce_ats(false) ech(false)]
containermanagerd: (ContainerManagerCommon) [com.apple.containermanager:fs] stat [<private>]: exists: 1, isDirectory: 0, fsNode: <~~~>
kernel: (1864790775): No more valid control units, disabling flow divert
kernel: (1864790775): Skipped all flow divert services, disabling flow divert
identityservicesd: (libboringssl.dylib) [com.apple.network:boringssl] boringssl_context_info_handler(2133) [C383:1][0x125f17eb0] Client handshake started
com.kithrup.SimpleTPP.Provider: (libnetworkextension.dylib) [com.apple.networkextension:] (1864790775): Closing reads (sending SHUT_WR), closed by plugin (flow error: 0)
com.kithrup.SimpleTPP.Provider: (libnetworkextension.dylib) [com.apple.networkextension:] (1864790775): Closing writes, sending SHUT_RD
com.kithrup.SimpleTPP.Provider: (libnetworkextension.dylib) [com.apple.networkextension:] (1864790775): Destroying, client tx 0, client rx 0, kernel rx 0, kernel tx 0
identityservicesd: (libboringssl.dylib) [com.apple.network:boringssl] boringssl_context_info_handler(2150) [C383:1][0x125f17eb0] Client handshake state: TLS client enter_early_data
Do you mean that when you have the Proxy Provider running that FaceTime is not working or FaceTime does not work at all?
When it is running, neither FaceTime video nor audio work. I can get it to connect outbound (haven't tried inbound recently, so I forget), but there's just no audio or video.
Let me get all of the log messages for a small time frame.
Well, I can do the same thing in Swift, it just wasn't as obvious, by using #if and typealias appropriately. It's not much uglier than in C, so that's fine.
The goal is to test things like error states, open/close states, and various callbacks. So I mocked up my own NWHostEndpoint (aka TestNWHostEndpoint), NEAppProxyFlow, etc. They're immensely simplistic, and won't test as well as the real thing, but I should be able to test my API fairly well.
(This was all motivated by the fact that my attempt to use DispatchIO crashes, every time, with enough load, and I can't figure out why. So maybe if I mock up a couple hours' worth of 100+ flows/sec, I might be able to track it down.)
Matt Easton over at https://forums.developer.apple.com/forums/thread/756508 said otherwise -- specifically,
For example, your flow copying logic relies on NEAppProxyFlow objects, but you could mock up a test object that represents the infrastructure in this class to just pipe network traffic into it to exercise your flow copying logic.
If it were in ObjC, I could do #if and/or #define tricks to include different headers, but I don't know how I'd do that with Swift. (And I can never remember how to define something in build settings or the command line and have Swift able to use #if for it, so I always have to go searching for it again.)
Ok, that is a dispatch thread limit; that's different from OS threads. That makes sense.
Alas.
I did figure out my design problem, which was creating all of the threads, however, the user-space proxy has a separate thread per network connection.
Fortunately that case didn't come into play -- I only .activate() and .cancel() it. The documentation wasn't really clear, but I assume a cancelled source can be released.
Also, as I said, that didn't appear to be my real problem, so I'm not sure what all was going on. I've rewritten the code that uses kevent into ObjC, to see if that helps (ie., I got very confused about conversions again).
Oh, hm. Ok, that's not it. It's when my object is deinit'ted at all. That seems very weird.