Thanks Quinn for the quick reply and API clarification.
Multiline
My first suggestion here would be to get the other third-party’s stuff out of the equation. That way you’re only dealing with code that you control and the OS.
BlockQuote
We did test that. With just the Cisco VPN installed everything works, and with just our content filter everything works. But with both installed the bug occurs. Also we have our own DNS Proxy but since there's a system restriction of just 1 DNS Proxy we can't enable ours when customers require Cisco VPN to access corporate resources.
With our DNS Proxy enabled we get no issues either, however our DNS Proxy and our Content Filter are hosted in the same process so our Proxy traffic is not filtered through our Content filter. This is also how Cisco is hosted: all 3 net extensions in one process.
Given this would a DTS incident still be useful or should I just open a Radar?
Post
Replies
Boosts
Views
Activity
Example:
% dig 0f54fx204jpt.stspg-customer.com
;; Warning: Message parser reports malformed message packet.
; <<>> DiG 9.10.6 <<>> 0f54fx204jpt.stspg-customer.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4251
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: Message has 35 extra bytes at end
;; QUESTION SECTION:
;0f54fx204jpt.stspg-customer.com. IN A
;; Query time: 72 msec
;; SERVER: 2606:4700:4700::1111#53(2606:4700:4700::1111)
;; WHEN: Wed Aug 30 16:55:21 CDT 2023
;; MSG SIZE rcvd: 96
The above response is supposed to be 211 bytes. 96 bytes is the peek data size that our filter requests.
Logs from our filter:
// dig query made
2023-08-30 16:55:21.908228 [New] 24132: D89B5B5D-793C-4940-D720-1111F6E50100 17
2023-08-30 16:55:21.908380 [DATA] Out: 24132: D89B5B5D-793C-4940-D720-1111F6E50100 60@0
// Cisco proxy query made
2023-08-30 16:55:21.910918 [New] 10474: A1F78912-846E-4BEC-978D-C6B4F6E028F3 17
2023-08-30 16:55:21.911470 [DATA] Out: 10474: A1F78912-846E-4BEC-978D-C6B4F6E028F3 60@0
// Cisco proxy response
2023-08-30 16:55:21.978536 [DATA] In: 10474: A1F78912-846E-4BEC-978D-C6B4F6E028F3 96@0
// dig response
2023-08-30 16:55:21.979407 [DATA] In: 24132: D89B5B5D-793C-4940-D720-1111F6E50100 96@0
Does it dispatch requests to each in parallel or does it simply have a for(filter){request} serial loop?
I don’t know, but if you forced me to guess I’d say that the latter is the more likely option.
Ok, as I feared. So not only do content filters by default design have a head-of-line issue but the kernel itself does. We are back to the days of OS X network funnel serializing all (IP) network traffic. But only when 1+ 3rd party content filters is installed and then users blame us for performance issues.
Thanks for the insight Quinn, at least I have an idea of what might be happening now.
Now, as to whether that’ll actually improve the performance, that’s a very different question, one that doesn’t have an easy answer. My experience is that most networking code is I/O bound, so getting more CPUs to work on the problem doesn’t help.
It's not for getting more CPU time, it would be so multiple independent requests are not waiting on all requests before them.
No. Remember that each filter is running in its own process. However, it wouldn’t surprise me if your profiling revealed serialisation bottlenecks within the NE infrastructure.
Correct, but the kernel must wait on all filters to complete. Does it dispatch requests to each in parallel or does it simply have a for(filter){request} serial loop?
Sounds like you may want a packet filter instead of a socket/connection filter.
SYSEXTs don't run in a user session. So what you want to do is not possible, except maybe indirectly via launchctl asuser <UID> ... -- not sure if this would actually use the console session though. There's another indirect way too: launch agents, but then why not just have your main app be a launch agent?
I have this issue right now on my dev machine. It's like there's stale info in /Library/SystemExtensions/db.plist. I've seen clients have the issue too. On my machine the duplicate disabled extension has a lock icon.
Edit: For me and our clients it's only DNS Proxy that shows a zombie, our Socket Filter never does. And they are both hosted in the same SYSEXT.
Hi Matt, yes if no other NSYSEXT is on the system then there are no problems. We are only seeing this issue with this particular client. So far no other client is using another NSYSEXT. In addition, we are not able to reproduce the issue internally (again with no other NSYSEXTs).
Thanks for the reply Matt, here are the logs from the client:
112022-06-13 12:48:20.573425-0400 0x3d35 Default 0x0 1166 0 com.uptycs.kringle.daemon: (NetworkExtension) [com.apple.networkextension:] (0): Creating a new flow director
122022-06-13 12:48:20.573623-0400 0x3d35 Default 0x0 1166 0 com.uptycs.kringle.daemon: (NetworkExtension) [com.apple.networkextension:] [Extension com.uptycs.kringle]: Calling startProxyWithOptions with options 0x0
132022-06-13 12:48:20.573639-0400 0x3d35 Default 0x0 1166 0 com.uptycs.kringle.daemon: [com.uptycs.kringle:dns-proxy] start
142022-06-13 12:48:20.576279-0400 0x3d35 Default 0x0 1166 0 com.uptycs.kringle.daemon: [com.uptycs.kringle:dns-proxy] ready
...
2022-06-13 12:48:28.222215-0400 0x3cfa Default 0x0 158 0 sysextd: [com.apple.sx:XPC] client activation request for com.cisco.anyconnect.macos.acsockext
172022-06-13 12:48:28.264605-0400 0x3dc4 Default 0x0 262 0 nesessionmanager: (NetworkExtension) [com.apple.networkextension:] Clearing 42C1466A-D643-4CCB-9B29-A0FDF2B57F03 from the loaded configurations
182022-06-13 12:48:28.275395-0400 0x3dc6 Default 0x0 262 0 nesessionmanager: [com.apple.networkextension:] <NESMServer: 0x7ff0b3d047b0>: Deregister DNS Proxy Session: NESMDNSProxySession[Primary Tunnel:Uptycs Protect DNS Proxy:42C1466A-D643-4CCB-9B29-A0FDF2B57F03:(null)]
192022-06-13 12:48:28.275411-0400 0x3bcc Default 0x0 262 0 nesessionmanager: [com.apple.networkextension:] Registering session NESMDNSProxySession[Primary Tunnel:Cisco AnyConnect Socket Filter:FA292875-ADE4-4304-9423-E4527401CBAA:(null)]
202022-06-13 12:48:28.276187-0400 0x3d6a Default 0x0 1166 0 com.uptycs.kringle.daemon: (NetworkExtension) [com.apple.networkextension:] [Extension com.uptycs.kringle]: Calling stopProxyWithReason because: Configuration was disabled
212022-06-13 12:48:28.276190-0400 0x3d6a Default 0x0 1166 0 com.uptycs.kringle.daemon: [com.uptycs.kringle:dns-proxy] stop: 9
You can see from the logs that our DNS Proxy (com.uptycs.kringle.daemon) is started and then ~8 seconds later the Cisco NSYSEXT is started and our Proxy is stopped with code 9 (NEProviderStopReasonConfigurationDisabled). The Cisco NSYSEXT contains a socket filter and a DNS proxy. Our SYSEXT also contains a socket filter and a DNS proxy and only the DNS proxy is being stopped.
Thanks Quinn, I understand symlinks are quite different and that there's no path validation on them but ES is a replacement for BSM and BSM reported symlink events. Seems like ES should too. Oh and ES should also support user login/logout events like BSM (FeedbackID:FB9103833).
Did you add the com.apple.developer.endpoint-security.client? You also have to disable AMFI or it will kill the daemon if not signed.
Figured it out, I needed the com.apple.developer.networking.networkextension entitlement on both the SYSEX and the container app. Here's hoping for some good documentation in the future.