We've recently noticed an issue on our new MacStudios where calls to psort_r(3) stall forever. We haven't changed our HPC (particle simulation) code at all, and sampling the app shows psort_r is stuck in dispatch_group_wait(3).
Taking the code from Libc 1439.141.1, we've assembled our own implementation which allows us to pass a dispatch_queue_t, dispatch_group_t, and specify a wait time. After 10 seconds (a very long time in our case) the call returns with a non-zero exit code and shows the group and queue are in agreement: four additional blocks are waiting to dispatch, but haven't. This still takes more than two hours of simulation to achieve, where calls to psort_r must be succeeding to make forward progress.
Prior to this code change we've seen dispatch_group_wait stuck for hours. What else could we do to diagnose/debug this? We only see it on our M1 Ultra MacStudios, and the comparator passed to psort_r is simple C code (constant time). FB10893202