Network framework crashes from nw_browser_cancel call

Hi,

I'm using the Network framework to browse for devices on the local network. Unfortunately, I get many crash reports that crash in nw_browser_cancel, of which two are attached.

This discussion seems to have a similar issue, but it was never resolved: https://forums.developer.apple.com/forums/thread/696037

Contrary to the situation in the linked thread, my implementation uses DispatchQueue.main as the queue for the browser, so I don't think over-releasing the queue is the problem.

I am unable to reproduce this problem myself, but one of my users can reproduce it reliably it seems.

How can I resolve this crash?

Answered by DTS Engineer in 813797022

I’m looking at your second crash, which is easier for me to investigate because it’s on 18.1. In that I see this:

Thread 0 name:
Thread 0 Crashed:
0   libdispatch.dylib … dispatch_async + 192 (queue.c:944)
1   Network           … nw_browser_set_state_locked(NWConcrete_nw_browser*, nw_browser_state_t, NSObject*) + 560 (browser.cpp:406)
2   Network           … nw_browser_cancel + 484 (browser.cpp:1963)
3   MotionMount       … LanDiscoveryService.stopDiscovery() + 4 (LanDiscoveryService.swift:41)

Your code (frame 3) called nw_browser_cancel (frame 2) which is setting the state to nw_browser_state_cancelled (frame 1) which is trying to deliver the state change to your state update handler.

Disassembling dispatch_async I see this:

(lldb) disas -n dispatch_async
libdispatch.dylib`dispatch_async:
    …
    0x19ad2ea7c <+192>: ldr    w9, [x19, #0x54]

Note the instruction at +192 is accessing 0x54 bytes off x19. That matches the crashing memory address:

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0000000000000054

assuming that x19 is zero, which it is:

Thread 0 crashed with ARM Thread State (64-bit):
    …
   x16: 0x000000019a70c11c  …  x19: 0x0000000000000000

Looking further up the disassembly I see this:

(lldb) disas -n dispatch_async
libdispatch.dylib`dispatch_async:
    0x19ad2e9bc <+0>:   pacibsp 
    0x19ad2e9c0 <+4>:   stp    x22, x21, [sp, #-0x30]!
    0x19ad2e9c4 <+8>:   stp    x20, x19, [sp, #0x10]
    0x19ad2e9c8 <+12>:  stp    x29, x30, [sp, #0x20]
    0x19ad2e9cc <+16>:  add    x29, sp, #0x20
    0x19ad2e9d0 <+20>:  mov    x21, x1
    0x19ad2e9d4 <+24>:  mov    x19, x0
    …
    0x19ad2ea7c <+192>: ldr    w9, [x19, #0x54]

At +24 it sets x19 to x0, where x0 is the first input parameter. So Network framework has called dispatch_async with a NULL queue parameter! That’s not good.

Originally I thought that this must be some sort of race condition or memory corruption issue, but after staring at the code for a while I believe that it’s a logic bug in nw_browser. If you build and run this code, you’ll see the same crash:

nw_browse_descriptor_t descriptor = nw_browse_descriptor_create_bonjour_service("_ssh._tcp", nil);
nw_parameters_t parameters = nw_parameters_create();
nw_browser_t browser = nw_browser_create(descriptor, parameters);
nw_browser_set_state_changed_handler(browser, ^(nw_browser_state_t state, nw_error_t _Nullable error) {
    // do nothing
});
// nw_browser_set_queue(browser, dispatch_get_main_queue());
nw_browser_cancel(browser);

Note the commented out line, meaning that the code sets a state update handler but doesn’t set a queue. So when nw_browser_cancel goes to set the state to nw_browser_state_cancelled, nw_browser_set_state_locked tries to call the state update handler on… well… no queue.

I filed my own bug report about this (r. 139710124).

I’m not sure if that’s the only cause of this bug, but I recommend that you audit your code to make sure it can’t ever trigger this bug.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Accepted Answer

I’m looking at your second crash, which is easier for me to investigate because it’s on 18.1. In that I see this:

Thread 0 name:
Thread 0 Crashed:
0   libdispatch.dylib … dispatch_async + 192 (queue.c:944)
1   Network           … nw_browser_set_state_locked(NWConcrete_nw_browser*, nw_browser_state_t, NSObject*) + 560 (browser.cpp:406)
2   Network           … nw_browser_cancel + 484 (browser.cpp:1963)
3   MotionMount       … LanDiscoveryService.stopDiscovery() + 4 (LanDiscoveryService.swift:41)

Your code (frame 3) called nw_browser_cancel (frame 2) which is setting the state to nw_browser_state_cancelled (frame 1) which is trying to deliver the state change to your state update handler.

Disassembling dispatch_async I see this:

(lldb) disas -n dispatch_async
libdispatch.dylib`dispatch_async:
    …
    0x19ad2ea7c <+192>: ldr    w9, [x19, #0x54]

Note the instruction at +192 is accessing 0x54 bytes off x19. That matches the crashing memory address:

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0000000000000054

assuming that x19 is zero, which it is:

Thread 0 crashed with ARM Thread State (64-bit):
    …
   x16: 0x000000019a70c11c  …  x19: 0x0000000000000000

Looking further up the disassembly I see this:

(lldb) disas -n dispatch_async
libdispatch.dylib`dispatch_async:
    0x19ad2e9bc <+0>:   pacibsp 
    0x19ad2e9c0 <+4>:   stp    x22, x21, [sp, #-0x30]!
    0x19ad2e9c4 <+8>:   stp    x20, x19, [sp, #0x10]
    0x19ad2e9c8 <+12>:  stp    x29, x30, [sp, #0x20]
    0x19ad2e9cc <+16>:  add    x29, sp, #0x20
    0x19ad2e9d0 <+20>:  mov    x21, x1
    0x19ad2e9d4 <+24>:  mov    x19, x0
    …
    0x19ad2ea7c <+192>: ldr    w9, [x19, #0x54]

At +24 it sets x19 to x0, where x0 is the first input parameter. So Network framework has called dispatch_async with a NULL queue parameter! That’s not good.

Originally I thought that this must be some sort of race condition or memory corruption issue, but after staring at the code for a while I believe that it’s a logic bug in nw_browser. If you build and run this code, you’ll see the same crash:

nw_browse_descriptor_t descriptor = nw_browse_descriptor_create_bonjour_service("_ssh._tcp", nil);
nw_parameters_t parameters = nw_parameters_create();
nw_browser_t browser = nw_browser_create(descriptor, parameters);
nw_browser_set_state_changed_handler(browser, ^(nw_browser_state_t state, nw_error_t _Nullable error) {
    // do nothing
});
// nw_browser_set_queue(browser, dispatch_get_main_queue());
nw_browser_cancel(browser);

Note the commented out line, meaning that the code sets a state update handler but doesn’t set a queue. So when nw_browser_cancel goes to set the state to nw_browser_state_cancelled, nw_browser_set_state_locked tries to call the state update handler on… well… no queue.

I filed my own bug report about this (r. 139710124).

I’m not sure if that’s the only cause of this bug, but I recommend that you audit your code to make sure it can’t ever trigger this bug.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Thank you for your detailed answer.

I have the following code in the LanDiscoveryService:

    init() {
        setupBrowser()
    }

    func startDiscovery() {
        if browser == nil { setupBrowser() }
        if browser!.state == .ready { return } //Already running
        browser!.start(queue: DispatchQueue.main)
    }

    func stopDiscovery() {
        browser?.cancel()
        browser = nil
    }

If I now add a browser?.cancel() call at the end of init() I indeed get the exact same crash, so it seems that stopDiscovery() was called before startDiscovery(), which would set the queue. I'm uncertain how this can happen in my code, or why I initialised the browser in init(), so I've plenty of options to explore now to resolve this crash!

Many thanks!

Network framework crashes from nw_browser_cancel call
 
 
Q