I believe I've identified the issue and indeed Thread 12 and 19 seem to be the culprits, specifically a deadlock between them. The issue apperars to be caused by the interaction between libobjc and libdyld. When I reference files/line numbers here I'm looking at the source for objc4-818.2 and dyld-832.7.1 which I believe are the latest open source versions available.
As a reminder, this is what threads 12 and 19's frames looked like:
Thread 12 name:	Dispatch queue: com.google.FIRCoreDiagnostics
Thread 12:
0	 libsystem_kernel.dylib				 0x00000001c6703f5c __ulock_wait + 8
1	 libsystem_platform.dylib			 0x00000001e2c150cc _os_unfair_lock_lock_slow + 196
2	 libdyld.dylib								 0x0000000199445ff4 dyld3::AllImages::infoForImageMappedAt(void const*, void + 57332 (dyld3::LoadedImage const&, unsigned char) block_pointer) const + 204
3	 libdyld.dylib								 0x0000000199445ec0 dyld3::AllImages::pathForImageMappedAt+ 57024 (void const*) const + 368
4	 libdyld.dylib								 0x000000019944b658 dyld3::dyld_image_path_containing_address+ 79448 (void const*) + 60
5	 libobjc.A.dylib							 0x00000001add6b248 objc_copyImageNames + 152
6	 App												 	0x00000001052389d8 FIRPopulateProtoWithNumberOfLinkedFrameworks + 17205720 (FIRCoreDiagnostics.m:483)
…
Thread 19 name:	Dispatch queue: com.apple.CFNetwork.Connection
Thread 19:
0	 libsystem_kernel.dylib				 0x00000001c6703f5c __ulock_wait + 8
1	 libsystem_platform.dylib			 0x00000001e2c150cc _os_unfair_lock_lock_slow + 196
2	 libobjc.A.dylib							 0x00000001add6c174 lookUpImpOrForward + 152
3	 libobjc.A.dylib							 0x00000001add56524 _objc_msgSend_uncached + 68
4	 libxpc.dylib									 0x00000001e2c5a604 -[OS_xpc_object dealloc] + 56
5	 libdyld.dylib								 0x0000000199445404 invocation function for block in dyld3::AllImages::runImageCallbacks+ 54276 (dyld3::Array<dyld3::LoadedImage> const&) + 820
6	 libdyld.dylib								 0x00000001994449a0 dyld3::AllImages::runImageCallbacks+ 51616 (dyld3::Array<dyld3::LoadedImage> const&) + 172
7	 libdyld.dylib								 0x000000019944a2f0 dyld3::AllImages::loadImage+ 74480 (Diagnostics&, char const*, unsigned int, dyld3::closure::DlopenClosure const*, bool, bool, bool, bool, void const*) + 744
8	 libdyld.dylib								 0x0000000199449e2c dyld3::AllImages::dlopen+ 73260 (Diagnostics&, char const*, bool, bool, bool, bool, bool, void const*, bool) + 904
9	 libdyld.dylib								 0x000000019944bd14 dyld3::dlopen_internal+ 81172 (char const*, int, void*) + 372
10	libdyld.dylib								 0x000000019943dd44 dlopen_internal+ 23876 (char const*, int, void*) + 112
11	libnetwork.dylib							 0x000000019a8b6a6c __nw_protocol_get_tcp_image_block_invoke + 64
…
20	libnetwork.dylib							 0x000000019a65ea94 nw_parameters_create_secure_tcp + 4672
21	CFNetwork										 0x0000000199ebaf88 0x199e00000 + 765832
…
On thread 19 dlopen_internal has been called, in relation to CFNetwork, this calls into libdyld and in dyld3::AllImages::runImageCallbacks libdyld's global lock (AllImages.h:255) is taken through a call to withNotifiersLock (AllImages.cpp:382).
As part of runImageCallbacks there are a number of notifier blocks to be called, before or during this occurring we context switch to thread 12.
Thread 12 calls objc_copyImageNames, this locks libobjc's runtime lock (objc-runtime-new.mm:5521) before calling into libdyld through a call to fname (objc-runtime-new.mm:5543, defined in objc-private.h:505). This call into libdyld goes on to eventually call into AllImages::infoForImageMappedAt where it attempts to take libdyld's global lock (AllImages.cpp:678) through a call to withReadLock. This thread (12) now waits here as thread 19 has already acquired libdyld's global lock.
Both libdyld's global lock and libobjc's runtime lock are now locked by separate threads (19 and 12 respectively).
We context switch back to thread 19, the notifier blocks are now called, one of which calls dealloc on an OS_xpc_object, within which a call that requires Objective-C method dispatch is made, calling into lookUpImpOrForward which then attempts to take libobjc’s runtime lock (objc-runtime-new.mm:6427) but cannot as it is already locked by thread 12.
Thread 19 is now waiting on a lock acquired by thread 12 (libobjc's runtime lock) and thread 12 is now waiting on a lock acquired by thread 19 (libdyld's global lock). We're now deadlocked and the system watchdog eventually terminates the process.
I've filed this as bug report FB8971497. Let me know if this sounds about right though.
I suspect we’ve begun to see this issue occurring more in our app recently as we’ve began modularising which has increased the number of dynamic libraries that we link. As this is a concurrency & timing reliant deadlock it’s potentially the case that the increased number of dynamic libraries is resulting in thread 12's (com.google.FIRCoreDiagnostics) call to objc_copyImageNames taking longer when previously it would have finished by the time thread 19 calls dlopen.
Post
Replies
Boosts
Views
Activity
Potentially related with no replies: https://developer.apple.com/forums/thread/127335 . Thread 12 and 19 in the crash report linked in the original post - https://gist.github.com/SquaredTiki/11a58e6837029e44da6098b34486658d exhibit a similar pattern of frames.
This appears to have been resolved as of beta 7.
The issue here is that previously working code is now being hit by this check. I took a look at our investigation of this (r. 66931425) but it’s too early for me to post any concrete details. I hope that’ll change soon (-: Unfortunately the response on my bug report FB8128103 doesn't seem to suggest any investigation and despite it being a binary compatibility issue the response I received, in the form of the following message, asked me to close my report:
This is an issue specific to a third-party, not an Apple issue. This is a Realm bug that they’re tracking: https://github.com/realm/realm-cocoa/issues/6671 Please contact Realm for further support. Please close your feedback report, or let us know if this is still an issue for you. I've responded accordingly re. binary compatibility though so fingers crossed this gets another look.
The issue persists in beta 5.
Nothing yet for my bug report, still marked as Open.
Just updated to beta 4 and the issue persists.
I have filed bug report FB8128103. I wasn't 100% on which component to select so chose UIKit, however if this seems better placed set to something else on your side please do tweak/update so it lands with the right team.
Thanks Quinn, I stumbled upon some C code for that and was writing it up in Swift too, thanks for saving me the time and helping me debug the cause.
I was able to narrow down the file that is triggering the problem as I was able to reproduce this on my personal device so I could see the relevant Console log regarding suspension and termination:
[application<…>:3879] Terminating with context: <RBSTerminateContext| domain:15 code:0xDEAD10CC explanation:[application<…>:3879] was suspended with locked system files:
/var/mobile/Containers/Shared/AppGroup/A66EB78A-2BBC-49D4-BDEA-6A2AF7E8A5A6/default.realm.lock
not in allowed directories:
/var/mobile/Containers/Data/Application/E1435A44-ABC6-4254-B547-B5423D9FCAB1
/var/mobile/Containers/Data/Application/E1435A44-ABC6-4254-B547-B5423D9FCAB1/tmp reportType:CrashLog maxTerminationResistance:Interactive>
This points to Realm's default.realm.lock being the locked file which is not permitted as it sits within the App Group container as opposed to the app's own container (which I presume is the first 'allowed directory').
Whilst this explains the cause of the crash on beta 3 it doesn't explain why this only started occurring on beta 3.
I will file a bug report as you suggest with respect to binary compatibility but any insights you might be able to provide now we've narrowed down the affected file would be much appreciated too!