Realtime threading on macOS

Hi guys,


I am working on a video application that needs realtime performance. First I used GCD but than I saw one WWDC video where the priority decay mechanism was explained.

My application is quite simple. I have an external API call that waits for the next incoming video frame. So instead of using GCD I implemented a pthread which opt-out of the priority decay mechanism and set the priority very high as it was shown in the WWDC video.

This dispatch thread only waits for the next frame sync and starts several worker threads using conditional flags. All the worker threads also opt-out from priority decay with the same high priority like the dispatcher. They do not much, some are encoding Metal commands, others are reading some UDP data or fetching the new video frame from the hardware board.


When I used instruments I saw that all working threads has been scheduled to all available cores and blocking them all for 10ms. The UI gets unresponsive as well.


Here is a short version of the code. Maybe I understand something wrong with pthreads. So please apologize if this is a silly mistake I made here in my code. GCD would be much better to use but when I watched the WWDC video it seems that there is no way to opt-out from the priority decay problem.


void *dispatcherThread(void *params)
{
    while( !forcedExit ) {
        waitForNextFrame();
        pthread_mutex_lock(&mutex);
        needsCaptureFrame = true;
        needsProcessFrame = true;
        needsPlayoutFrame = true;
        pthread_cond_broadcast(&condition);
        pthread_mutex_unlock(&mutex);
    }
    pthread_exit(NULL);
}


void *workerThreadProcessFrame(void *params)
{
    while( !forcedExit ) {
        pthread_mutex_lock(&mutex);
        while (!needsProcessFrame && !forcedExit)
        pthread_cond_wait(&condition, &mutex);
        needsProcessFrame = false;
        pthread_mutex_unlock(&mutex);
       
        if (!forcedExit) {
            processFrame();
        }
    }
    pthread_exit(NULL);
}


The C function processFrame itself is bound to a Swift function. This works pretty well. Only problem is that all worker threads block every 40ms all cores of the Mac for 10ms even when their Swift function returns in a few mikroseconds.


Here is also the code snippet how the pthreads are created.


void startThread(void * _Nullable (* _Nonnull start_routine)(void * _Nullable)) {
    pthread_t thread;
    pthread_attr_t attr;
   
    int returnVal;
   
    // create attributes with the standard values
    returnVal = pthread_attr_init(&attr);
    assert(!returnVal);
   
    // set the detachstate attribute (because we don't need return values and therefor a pthread_join)
    returnVal = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
    assert(!returnVal);
   
    // set the scheduling policy to round robin to avoid priority decay (this is very important!!!)
    pthread_attr_setschedpolicy(&attr, SCHED_RR);
   
    // the thread priority is set to 45 which seems to be a good value on the mac
    struct sched_param param = {.sched_priority = 45};
    pthread_attr_setschedparam(&attr, ¶m);
   
    int threadError = pthread_create(&thread, &attr, start_routine, NULL);
    assert(!threadError);
   
    returnVal = pthread_attr_destroy(&attr);
    assert(!returnVal);
}


I would be really happy if someone has an idea why this dispath/worker mechanism does not work or if there is also a solution with GCD avoiding the priority decay problem.

Replies

After some further testings t seems that using conditions together with realtime pthreads was the problem. The pthread_cond_wait calls block the cpu cores and cause a huge amount of cpu cycles. When I develop on a MacBook Pro it will get very hot. So I changed the code to one dispatcher thread (round robin and a hight priority) and this thread creates the worker thread each time a new video frame arrives. The worker threads will directly exit after they finished their task.


This is not a good solution because creating new threads each rendering cycles have alot of disadvantages.


Imho the best solution to achieve realtime performance would be one dispatcher thread and a few worker threads which all will wait for a condition signal from the dispatcher thread to start their tasks. But they should not block a CPU core or cause alot of CPU cycles while in the blocking state. And they should start directly their work and nearly simultanuously after the dispatcher send the signal. A kind of thread pool for realtime trheads.


But this seems not to work. Either I set a high thread priority which causes the threads to block the CPU cores and heat up the machine, or I can lower the priority, but then I have a thread pool where the worker threads can have a long delay before the sheduler starts them and where the can interrupted by other threads.


Edit: Solved it. Realtime thread programming is really a challange. A few lines of code but when you mess up with them nothing works. Only question what is still open, must the threads use the round robin scheduling. As I saw the RR is the only way to avoid priority decay where another thread stops the thread from running and where the sheduler moves the thread back to the priority queue. The disadvantage with RR is imho that the threads execution time is not stable when I use several worker threads with the same priority.

macOS is not a realtime system. If you are doing video work, the. you need to be using the appropriate APIs that use the GPU. I don’t know what those are, but I know you aren’t going to be able to use pthreads to beat the system into submission. Pthreads is quite difficult to use. GCD is much easier to manage. In particular, you may be getting hit by spurious wakeups. Just because pthread_cond_wait returns doesn’t mean your condition is ready. You must manually check it.

Hi John,


I think I managed it. The Mac is in my opinion a perfect machine for realtime processing and especially the new Mac Pro. It was more or less my fault. Pthread programming seems easy but is sometimes a big challange. But the Xcode instruments tools are really great for finding the bugs. The pthread_cond_wait and pthread_cond_broadcast work fine using round robin threads with a priority of 48. The problem was how I checked the property in the while loop and where I set their values.

Only problems that still exist are small performance peaks visible in the debugger, the small time differences between the start of the worker threads and that the execution time stays not the same. It is nearly the same but you can see small time differences even the worker threads do the same. I think the reason for this is in the sheduler and that I use the round robin mechanism. But in the debugger I see that each thread is running on its own CPU core. So I must investigate this behaviour a little bit more. Maybe there is still some room of improvements. But the CPU usage is now good and all the threads show a constant behaviour over time without lags. And the UI stays responsive.