Watchdog Temination: all threads locked in libobjc

Over the past couple of weeks we've seen an increase in crashes on launch reported by users. On retrieval of these crashes from their devices (as they were not reported in Xcode) it became apparent that they were watchdog 0x8badf00d terminations.

Having consulted the documentation on Addressing Watchdog Terminations the crash reports suggest the app is mostly idle as the elapsed application CPU time is very small (0.134s) compared to the total elapsed CPU time (28.57s).

What seems concerning is that the reports show almost all threads are locked (__ulock_wait) in libobjc waiting on either class look-up (look_up_class) or method dispatch (_objc_msgSend_uncached/lookUpImpOrForward), bar two queues, one of which is waiting on another queue and the other of which is inspecting images loaded by dyld (likely hitting the same lock?).

There's nothing immediately apparent suggesting our app is doing anything to trigger this (the main thread is simply waiting on an object to be initialised). All crash reports present the exact same series of frames on the main thread which suggests it's not something happening beforehand taking a large amount of time (which I'd expect to be more variable).

For all users re-opening the app again it works fine. Heres one of the clearest crash reports (we've seen it occurring on iOS 14.2 and above).

Any ideas what might be happening here?


Potentially related with no replies: https://developer.apple.com/forums/thread/127335 . Thread 12 and 19 in the crash report linked in the original post exhibit a similar pattern of frames.
With regards the crash report you posted, I suspect that you’re deadlocking as follows:
  • Thread 12 is holding the Objective-C runtime lock (frame 5) and attempt to acquire the dyld lock (frame 2).

  • Thread 19 is holding the dyld lock (frame 6) and attempting to acquire the Objective-C runtime lock (frame 2).

It’s hard to be 100% sure though. Still, I think this crash report is enough evidence for you to file a bug about this. Please post your bug number, just for the record.

If you’d like me, or one of my colleagues, to dig into this some more, you should open a DTS tech support incident asking for a workaround to the bug you filed.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
I believe I've identified the issue and indeed Thread 12 and 19 seem to be the culprits, specifically a deadlock between them. The issue apperars to be caused by the interaction between libobjc and libdyld. When I reference files/line numbers here I'm looking at the source for objc4-818.2 and dyld-832.7.1 which I believe are the latest open source versions available.

As a reminder, this is what threads 12 and 19's frames looked like:

Code Block
Thread 12 name: Dispatch queue: com.google.FIRCoreDiagnostics
Thread 12:
0 libsystem_kernel.dylib 0x00000001c6703f5c ulock_wait + 8
1 libsystem_platform.dylib 0x00000001e2c150cc _os_unfair_lock_lock_slow + 196
2 libdyld.dylib 0x0000000199445ff4 dyld3::AllImages::infoForImageMappedAt(void const*, void + 57332 (dyld3::LoadedImage const&, unsigned char) block_pointer) const + 204
3 libdyld.dylib 0x0000000199445ec0 dyld3::AllImages::pathForImageMappedAt+ 57024 (void const*) const + 368
4 libdyld.dylib 0x000000019944b658 dyld3::dyld_image_path_containing_address+ 79448 (void const*) + 60
5 libobjc.A.dylib 0x00000001add6b248 objc_copyImageNames + 152
6 App 0x00000001052389d8 FIRPopulateProtoWithNumberOfLinkedFrameworks + 17205720 (FIRCoreDiagnostics.m:483)
Thread 19 name: Dispatch queue: com.apple.CFNetwork.Connection
Thread 19:
0 libsystem_kernel.dylib 0x00000001c6703f5c ulock_wait + 8
1 libsystem_platform.dylib 0x00000001e2c150cc _os_unfair_lock_lock_slow + 196
2 libobjc.A.dylib 0x00000001add6c174 lookUpImpOrForward + 152
3 libobjc.A.dylib 0x00000001add56524 _objc_msgSend_uncached + 68
4 libxpc.dylib 0x00000001e2c5a604 -[OS_xpc_object dealloc] + 56
5 libdyld.dylib 0x0000000199445404 invocation function for block in dyld3::AllImages::runImageCallbacks+ 54276 (dyld3::Array<dyld3::LoadedImage> const&) + 820
6 libdyld.dylib 0x00000001994449a0 dyld3::AllImages::runImageCallbacks+ 51616 (dyld3::Array<dyld3::LoadedImage> const&) + 172
7 libdyld.dylib 0x000000019944a2f0 dyld3::AllImages::loadImage+ 74480 (Diagnostics&, char const*, unsigned int, dyld3::closure::DlopenClosure const*, bool, bool, bool, bool, void const*) + 744
8 libdyld.dylib 0x0000000199449e2c dyld3::AllImages::dlopen+ 73260 (Diagnostics&, char const*, bool, bool, bool, bool, bool, void const*, bool) + 904
9 libdyld.dylib 0x000000019944bd14 dyld3::dlopen_internal+ 81172 (char const*, int, void*) + 372
10 libdyld.dylib 0x000000019943dd44 dlopen_internal+ 23876 (char const*, int, void*) + 112
11 libnetwork.dylib 0x000000019a8b6a6c __nw_protocol_get_tcp_image_block_invoke + 64
20 libnetwork.dylib 0x000000019a65ea94 nw_parameters_create_secure_tcp + 4672
21 CFNetwork 0x0000000199ebaf88 0x199e00000 + 765832


On thread 19 dlopen_internal has been called, in relation to CFNetwork, this calls into libdyld and in dyld3::AllImages::runImageCallbacks libdyld's global lock (AllImages.h:255) is taken through a call to withNotifiersLock (AllImages.cpp:382).

As part of runImageCallbacks there are a number of notifier blocks to be called, before or during this occurring we context switch to thread 12.

Thread 12 calls objc_copyImageNames, this locks libobjc's runtime lock (objc-runtime-new.mm:5521) before calling into libdyld through a call to fname (objc-runtime-new.mm:5543, defined in objc-private.h:505). This call into libdyld goes on to eventually call into AllImages::infoForImageMappedAt where it attempts to take libdyld's global lock (AllImages.cpp:678) through a call to withReadLock. This thread (12) now waits here as thread 19 has already acquired libdyld's global lock.

Both libdyld's global lock and libobjc's runtime lock are now locked by separate threads (19 and 12 respectively).

We context switch back to thread 19, the notifier blocks are now called, one of which calls dealloc on an OS_xpc_object, within which a call that requires Objective-C method dispatch is made, calling into lookUpImpOrForward which then attempts to take libobjc’s runtime lock (objc-runtime-new.mm:6427) but cannot as it is already locked by thread 12.

Thread 19 is now waiting on a lock acquired by thread 12 (libobjc's runtime lock) and thread 12 is now waiting on a lock acquired by thread 19 (libdyld's global lock). We're now deadlocked and the system watchdog eventually terminates the process.

I've filed this as bug report FB8971497. Let me know if this sounds about right though.

I suspect we’ve begun to see this issue occurring more in our app recently as we’ve began modularising which has increased the number of dynamic libraries that we link. As this is a concurrency & timing reliant deadlock it’s potentially the case that the increased number of dynamic libraries is resulting in thread 12's (com.google.FIRCoreDiagnostics) call to objc_copyImageNames taking longer when previously it would have finished by the time thread 19 calls dlopen.

I've filed this as bug report FB8971497.

Thanks for that. I took a quick look and it’s being routed appropriately.

Let me know if this sounds about right though.

Yeah.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@apple.com"
Watchdog Temination: all threads locked in libobjc
 
 
Q