Hi,
I'm using the Network framework to browse for devices on the local network. Unfortunately, I get many crash reports that crash in nw_browser_cancel, of which two are attached.
This discussion seems to have a similar issue, but it was never resolved: https://forums.developer.apple.com/forums/thread/696037
Contrary to the situation in the linked thread, my implementation uses DispatchQueue.main
as the queue for the browser, so I don't think over-releasing the queue is the problem.
I am unable to reproduce this problem myself, but one of my users can reproduce it reliably it seems.
How can I resolve this crash?
I’m looking at your second crash, which is easier for me to investigate because it’s on 18.1. In that I see this:
Thread 0 name:
Thread 0 Crashed:
0 libdispatch.dylib … dispatch_async + 192 (queue.c:944)
1 Network … nw_browser_set_state_locked(NWConcrete_nw_browser*, nw_browser_state_t, NSObject*) + 560 (browser.cpp:406)
2 Network … nw_browser_cancel + 484 (browser.cpp:1963)
3 MotionMount … LanDiscoveryService.stopDiscovery() + 4 (LanDiscoveryService.swift:41)
Your code (frame 3) called nw_browser_cancel
(frame 2) which is setting the state to nw_browser_state_cancelled
(frame 1) which is trying to deliver the state change to your state update handler.
Disassembling dispatch_async
I see this:
(lldb) disas -n dispatch_async
libdispatch.dylib`dispatch_async:
…
0x19ad2ea7c <+192>: ldr w9, [x19, #0x54]
Note the instruction at +192 is accessing 0x54 bytes off x19
. That matches the crashing memory address:
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0000000000000054
assuming that x19
is zero, which it is:
Thread 0 crashed with ARM Thread State (64-bit):
…
x16: 0x000000019a70c11c … x19: 0x0000000000000000
Looking further up the disassembly I see this:
(lldb) disas -n dispatch_async
libdispatch.dylib`dispatch_async:
0x19ad2e9bc <+0>: pacibsp
0x19ad2e9c0 <+4>: stp x22, x21, [sp, #-0x30]!
0x19ad2e9c4 <+8>: stp x20, x19, [sp, #0x10]
0x19ad2e9c8 <+12>: stp x29, x30, [sp, #0x20]
0x19ad2e9cc <+16>: add x29, sp, #0x20
0x19ad2e9d0 <+20>: mov x21, x1
0x19ad2e9d4 <+24>: mov x19, x0
…
0x19ad2ea7c <+192>: ldr w9, [x19, #0x54]
At +24 it sets x19
to x0
, where x0
is the first input parameter. So Network framework has called dispatch_async
with a NULL
queue parameter! That’s not good.
Originally I thought that this must be some sort of race condition or memory corruption issue, but after staring at the code for a while I believe that it’s a logic bug in nw_browser
. If you build and run this code, you’ll see the same crash:
nw_browse_descriptor_t descriptor = nw_browse_descriptor_create_bonjour_service("_ssh._tcp", nil);
nw_parameters_t parameters = nw_parameters_create();
nw_browser_t browser = nw_browser_create(descriptor, parameters);
nw_browser_set_state_changed_handler(browser, ^(nw_browser_state_t state, nw_error_t _Nullable error) {
// do nothing
});
// nw_browser_set_queue(browser, dispatch_get_main_queue());
nw_browser_cancel(browser);
Note the commented out line, meaning that the code sets a state update handler but doesn’t set a queue. So when nw_browser_cancel
goes to set the state to nw_browser_state_cancelled
, nw_browser_set_state_locked
tries to call the state update handler on… well… no queue.
I filed my own bug report about this (r. 139710124).
I’m not sure if that’s the only cause of this bug, but I recommend that you audit your code to make sure it can’t ever trigger this bug.
Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"