Thanks, this sheds some light on (1) although doesn't answer (2), (3) and (4) in my original post.
I'd be glad to know more about that library cache / system partition thing. Could I see that partition? Are there the whole "***.framework" folders themselves in there or just the dylib files that were stripped from the "original" location? As this is not traditional unix anymore why do we still have "dli_fname" with something that resembles a valid path, instead of, say, an empty string? Given exceptions in (4) normal library locations are still supported so it does look like an overly complicated setup: for starters you need to support both ways instead of just one.
Taking a step back, why are you trying to dlopen system libraries?
That's just a test setup that reproduces the difference in behaviour in this particular test. Basically I need the library file name, file size and its modification date, ideally the checksum but can do with the first three, so dlsym (with one of the magic constants for the file specification) + dladdr + stat or similar API should be enough, and dlopen is not needed.
Post
Replies
Boosts
Views
Activity
Ah, sorry, I must have misread your answer in the original thread! I thought you are implying that while there is a limitation of 64K on sockets (which translates to the corresponding limitation on the maximum simultaneous connections) these limitations is not applicable anymore now that we have user-space networking. The logical path I've been following was: as we do have situations still where user-space networking is inactive we thus still have the issue of 64K sockets -> connections! I must admit I assumed quite a lot from what you didn't say explicitly.
So instead of assuming any further: please confirm there is no issue of having more than 64K connections even in cases where user-space networking is not active. At worst I'd have to call setrlimit to bump the count beyond 64K (which is a non-privileged AppStore compatible call).
Thanks!
I see, thank you!
Where does it leave us... There are a few things currently that could make user-space networking not working. Imagine you asking users to do a number of steps like switching the firewall off to make your app working – not many users will like doing that.
Note As with everything we’ve been talking about on this thread, this is an implementation detail that could well change in the future.
I hope it will!
I just tried it here in my office and, yeah, my test tool starts using sockets as soon as I enable the firewall.
On macOS 14 as well?
Thank you Quinn,
I didn't try VM yet.
Just now I tried removing both network conditioner and VPN + restart – no joy, same outcome.
If I don't use iCloud+ (and as for iCloud I only use FindMy, with everything else disabled). should I be worried about iCloud Private Relay?
How to switch that off?
Wow, thank you. What would you recommend me to do to troubleshoot and find the culprit that causes a different outcome on my computer? Other than inactive VPN I also have inactive Network Link conditioner, otherwise I can't think of anything else. Could any of those two two things affect the outcome?
I could write an app that tests various conditions that must be met for user-space networking to work properly (so long as those could be tested with public API's and I know what calls to make :)
My hardware: 2021 16 inch MacBook Pro, running macOS 13.6.3 (22G436)
Hi, I'd really appreciate if anyone here could run my self-contained test above on their current macOS version to see if it matches my results or not. If you do please share your result here indicating the OS version and whether you have VPN installed or not (and whether it's activated or disabled).
Thank you Quinn!
I tried your test two different ways:
To run it in UI app I slightly modified it (moved connections to a global and increased the count to 500):
var connections: [NWConnection] = []
func quinnTest() throws {
connections = (0..<500).map { _ in start() }
}
// call quinnTest() from view's init()
The app logs "Too many open files "errors",
"lsof" lists 251 entries like these:
NetTest 62541 yam 252u IPv4 0xcbd9eab08f46aaf1 0t0 TCP yam-mac:53997->93.184.215.14:http (ESTABLISHED)
there's no flowsw in the list
skywalkctl returns an empty list:
Proto Local Address Remote Address InBytes OutBytes InPkts/InSPkts OutPkts/OutSPkts SvC NetIf Port Adv Flags Local State Remote State Local RTT Remote RTT Process.PID
I also tried your test unmodified, results are very similar to (1): many socket entries in lsof, no flowsw entry, empty list in skywalkctl, although I am not getting "too many files" in your test for two reasons: 10 is not a big number and getrlimit+RLIMIT_NOFILE by default returns a significantly higher number in console apps than in UI apps.
Other than that:
I do not use iCloud+ and as for iCloud I use FindMy only with the rest iCloud features turned off.
I have VPN installed but it is switched off, could it still matter?
I am on macOS 13.6.3, will recheck on macOS 14 when possible.
I wonder what results you are getting if you run my test if you can do it.
Could it be the case that this feature (of not using a socket per NWConnection) is macOS 14+ feature only?
Do I somehow opt-out of using sockets or should they be not used by default?
When running this test app:
import SwiftUI
import Network
var activeConnections: [NWConnection] = [] {
didSet {
print("connection count: \(activeConnections.count)")
}
}
let site = "www.apple.com"
func openNewConnections(count: Int) {
if count <= 0 { return }
let c = NWConnection(host: NWEndpoint.Host(site), port: .https, using: .tls)
activeConnections.append(c)
c.stateUpdateHandler = { state in
switch state {
case .cancelled: fatalError("• cancelled")
case .failed(let error): fatalError("• Error: \(error)")
case .setup: break
case .waiting: break
case .preparing: break
case .ready:
c.send(content: "GET https://\(site)/index.html HTTP/1.0\n\n".data(using: .utf8), completion: NWConnection.SendCompletion.contentProcessed { error in
if let error {
fatalError("• send ended with Error: \(error)")
}
})
c.receive(minimumIncompleteLength: 1, maximumLength: 20) { data, contentContext, isComplete, error in
if let data {
if let s = String(data: data, encoding: .utf8) {
openNewConnections(count: count - 1)
} else {
openNewConnections(count: count - 1)
}
} else {
fatalError("• Error: \(String(describing: error))")
}
}
@unknown default:
fatalError("TODO")
}
}
c.start(queue: .main)
}
@main struct NetTestApp: App {
init() {
var r = rlimit()
let err = getrlimit(RLIMIT_NOFILE, &r)
precondition(err == 0)
print("max files:", r.rlim_cur)
openNewConnections(count: 20_000)
}
var body: some Scene {
WindowGroup { Text("Hello, World") }
}
}
I'm hitting the "too many open files" error around the number reported by getrlimit + RLIMIT_NOFILE (249'th connection out of 256 max files) and I can see the word "socket" in the log, in function names and "Failed to initialize socket" error message:
max files: 256
connection count: 1
...
connection count: 2
connection count: 246
connection count: 247
connection count: 248
nw_socket_initialize_socket <private> Failed to create socket(2,1) [24: Too many open files]
nw_socket_initialize_socket Failed to create socket(2,1) [24: Too many open files]
nw_socket_initialize_socket Failed to create socket(2,1) [24: Too many open files], dumping backtrace:
[arm64] libnetcore-3100.140.3
0 Network 0x00000001938e4564 __nw_create_backtrace_string + 192
1 Network 0x0000000193b0b164 _ZL27nw_socket_initialize_socketP11nw_protocol + 2008
2 Network 0x0000000193b2917c _ZL27nw_socket_add_input_handlerP11nw_protocolS0_ + 1416
3 Network 0x0000000193c901b4 nw_endpoint_flow_attach_socket_protocol + 380
4 Network 0x0000000193c800a0 nw_endpoint_flow_attach_protocols + 6492
5 Network 0x0000000193c7d304 nw_endpoint_flow_setup_protocols + 3664
6 Network 0x0000000193c989e8 -[NWConcrete_nw_endpoint_flow startWithHandler:] + 4092
7 Network 0x00000001937625ac nw_endpoint_handler_path_change + 9400
8 Network 0x000000<…>
nw_socket_add_input_handler [C248.1.1:2] Failed to initialize socket
I think the same would happen in the console app (where getrlimit + RLIMIT_NOFILE returns a higher number 7168 = 0x1C00) if I wait long enough (possibly make the test more robust first to handle the cancellation errors, etc).
I am on macOS 13.6.
Edit: It's also not obvious how to recover from that error as it is not reported back via normal Swift error mechanism, looks like it uses either C++ or Objective-C exception mechanism. The last NWConnection() call completes, setting stateUpdateHandler, and "start" calls complete, then the OS internals fires this error in the log and the update handler is not called two more times (with "preparing" and "waiting" – normally waiting is not called) without "ready" – thus the logic of handling response or restarting a new connection stops proceeding without having a chance of handling or printing out the relevant error!
More specifically if I add more log entries I see this at the very end:
state: preparing, connectionCount: 248
// here goes string of errors
state: waiting, connectionCount: 248
and nothing else afterwards. For the previous connections before the file limit is hit – "ready" callout happens after "preparing" and without "waiting" callout.
Well, my colleague naively assumed that since it was up to 1.5 TB 5 years ago, and modern servers could go up to 6TB (if not more) — then modern most powerful Mac Pros would be configurable to up to a few TBs at least. Apparently he assumed wrong. Unified memory (that's for quick GPU - CPU transfers?) shouldn't be important in his use case as he is interested in CPU only.
This whole discussion assumes that every network connection requires a socket. This isn’t the case on most Apple platforms, which have a user-space networking stack that you can access via the Network framework [1].
[1] The one exception here is macOS, where Network framework has to run through the kernel in order to support NKEs. This is one of the reasons we’re in the process of phasing out NKE support, starting with their deprecation in the macOS 10.15 SDK.
Hi Quinn, it's been a while since you wrote the above.
Do you know if macOS is still an unfortunate exception that requires a socket per Network framework's connection?
cc @eskimo
I see (and will file the bug). I was a bit nervous running a very similar code to yours on my computer and making far fetching conclusions about what to expect in the ifa_data field in the wild:
the doc is buggy, but because it's buggy it's hard to tell what is correct in it and what's not (what to trust and what not). The most conservative approach would be: "if it's buggy – don't second guess it and don't trust it at all".
different OS versions / or (Apple) platforms might do things differently, and what happens on my device might not be the whole story.
special conditions on user devices (wifi/cellular? p2p? network link conditioner? vpn tunnels? usb wifi dongles?) might mess things further.
Some (in fact strong [1]) reassurance I'll have when I see the actual source that populates ifa_data field. Could you please point me to the right place if that's publicly available?
[1] normally it's unwise to jump to conclusions based on what's in the source: normally the doc is king and implementation could be changed on a whim... In this particular case, though (1. doc is buggy, so definitely not king, 2. the API is ancient, so hardly ever change (significantly) 3. Hyrum's law is at play and it makes implementation changes even less likely) the source code deemed to be the best "source of truth".
Thank you!
So there was no way indeed to download folders in there?!
🤣
concurrentPerform is a class method that has no queue parameter, so the wording in the documentation about "If the target queue is a concurrent queue..." is bogus, not just unclear.
Right... I'm still curious though how to properly read this specific part of the documentation:
For all other address families, it contains a pointer to the struct ifa_data (as defined in include file <net/if.h>) which contains per-address interface statistics.
as I do not see "struct ifa_data" anywhere...