dispatch_async to global much slower in recent Mac Catalyst versions?

Question

diffent OP

Created May ’23

Replies 3

Boosts 0

Views 1k

Participants 2

has anyone else noticed much slower GCD runs in newer MacOS / Catalyst

this seems like it used to be blazing fast:

dispatch_async(dispatch_get_global_queue(QOS_CLASS_USER_INTERACTIVE, 0), ^{ // code to run });

now if I run a block on this type of queue versus the main thread, the dispatched code runs much slower vs main thread. not 10%, like multiple slower. i am not sure yet if it is the code run time or time for dispatch to trigger. trying to focus in on what is the problem on our side and get some metrics, but if anyone has seen this issue, it might be useful to compare notes.

Boost

Answer 1

diffent OP

May ’23

i think i have traced the problem to drand48() system call random number generator which i use extensively

it seems like when you use the drand48 function a lot in the block sent to dispatch, the various threads that run the block get serialized or otherwise jammed up because of this function, so your code doesnt speed up the the expected amount when you dispatch concurrent blocks and just runs the same speed or slower as on 1 thread (slower due to the extra overhead of other threads, dispatch, thread syncing, etc)

the thing i found is that this slow down doesnt seem to show up in the Instruments app. it shows a little bit of drand48 taking up CPU as expected, but not huge.... since it is not using CPU power and is just waiting for other threads to handle memory access, i would guess. such waiting may show up in some portion of instruments i didnt look at.

this post seems to get into the details of why this is occurring:

https://stackoverflow.com/questions/22660535/pthreads-and-drand48-concurrency-performance

will post a workaround. tentatively working on pre-generating some randoms in per-thread arrays or a global array using a per thread index. if you use a global index to pull from the pre generated randoms, it shows a similar slowdown as drand48

0

Answer 2

diffent OP

May ’23

Non optimized workaround to above thread serialization problem w/ concurrent thread uses of drand48.

The key is the __thread keyword. If you just make these __thread variable static or global, you see similar slowdowns.

My code does not depend so much on pure random numbers so this is good enough to get rolling.

There may be a better solution.

#if 1

const int s_nrandoms = 10000;

__thread double s_someRandoms[s_nrandoms];
      
__thread int s_init = 0;
      
__thread int s_lastRandomIndex = 0;
    
double drand48x() {
    
     if (!s_init) {
      
         for (int i = 0; i &lt; s_nrandoms; i++) {
         
            s_someRandoms[i] = drand48();
         }
         
         s_init = 1;
        
      }
      
      s_lastRandomIndex++;
      if (s_lastRandomIndex > s_nrandoms-1) s_lastRandomIndex = 0;
      
      return s_someRandoms[s_lastRandomIndex];
}

#define drand48 drand48x

#endif

0

Answer 3

DTS Engineer OP

Apple

May ’23

Do you have to use drand48 specifically? Because there are better random number generators out there.

Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

0