dispatch_async to global much slower in recent Mac Catalyst versions?

has anyone else noticed much slower GCD runs in newer MacOS / Catalyst

this seems like it used to be blazing fast:

dispatch_async(dispatch_get_global_queue(QOS_CLASS_USER_INTERACTIVE, 0), ^{ // code to run });

now if I run a block on this type of queue versus the main thread, the dispatched code runs much slower vs main thread. not 10%, like multiple slower. i am not sure yet if it is the code run time or time for dispatch to trigger. trying to focus in on what is the problem on our side and get some metrics, but if anyone has seen this issue, it might be useful to compare notes.

Replies

i think i have traced the problem to drand48() system call random number generator which i use extensively

it seems like when you use the drand48 function a lot in the block sent to dispatch, the various threads that run the block get serialized or otherwise jammed up because of this function, so your code doesnt speed up the the expected amount when you dispatch concurrent blocks and just runs the same speed or slower as on 1 thread (slower due to the extra overhead of other threads, dispatch, thread syncing, etc)

the thing i found is that this slow down doesnt seem to show up in the Instruments app. it shows a little bit of drand48 taking up CPU as expected, but not huge.... since it is not using CPU power and is just waiting for other threads to handle memory access, i would guess. such waiting may show up in some portion of instruments i didnt look at.

this post seems to get into the details of why this is occurring:

https://stackoverflow.com/questions/22660535/pthreads-and-drand48-concurrency-performance

will post a workaround. tentatively working on pre-generating some randoms in per-thread arrays or a global array using a per thread index. if you use a global index to pull from the pre generated randoms, it shows a similar slowdown as drand48

Non optimized workaround to above thread serialization problem w/ concurrent thread uses of drand48.

The key is the __thread keyword. If you just make these __thread variable static or global, you see similar slowdowns.

My code does not depend so much on pure random numbers so this is good enough to get rolling.

There may be a better solution.

#if 1

const int s_nrandoms = 10000;

__thread double s_someRandoms[s_nrandoms];
      
__thread int s_init = 0;
      
__thread int s_lastRandomIndex = 0;
    
double drand48x() {
    
     if (!s_init) {
      
         for (int i = 0; i < s_nrandoms; i++) {
         
            s_someRandoms[i] = drand48();
         }
         
         s_init = 1;
        
      }
      
      s_lastRandomIndex++;
      if (s_lastRandomIndex > s_nrandoms-1) s_lastRandomIndex = 0;
      
      return s_someRandoms[s_lastRandomIndex];
}

#define drand48 drand48x

#endif

Do you have to use drand48 specifically? Because there are better random number generators out there.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

  • hello, thanks for the response! no i dont need drand48() particularly, that is just my go-to on apple platforms. any reasonable uniform random [0.0, 1.0] would work. i saw some other ones in the docs but havent tested them for this thread interference issue. one thing about the random functions that i noticed, they can get slower if they are "truly" random. i just need approximately random.

Add a Comment