Are there performance advantages of using pthread vs Dispatch?

I'm working on a graphics application, and I'm experimenting with parallelizing the rendering and simulation. My first attempt has been using three queues: the main queue for simulation, a queue for rendering, and a queue for synchronization:


    let renderQueue = DispatchQueue(label: "render", qos: .userInitiated)
    let syncQueue = DispatchQueue(label: "sync", qos: .userInitiated)
    
    // The currentState is the shared state between rendering and simulation
    var currentState = SimState()


    func run() {
        DispatchQueue.main.async {
            while true {
            let newState = simulation.update(currentState)
                syncQueue.sync { currentState = newState }
            }
        }
        renderQueue.async {
            var renderState: SimState!
            syncQueue.sync { renderState = currentState }
            renderer.render(renderState)
        }
    }


So this works, but with performance stutters. When I profile the application, I can see that there are periods where all my queues are blocked at the same time.


I notice that in the slides for the "metal game performance optimization" from Apple, they are actually using pthread primitives for parallelization.


So is pthread just more suitable for performance-critical parallelization, or is Dispatch still suitable for this application?

Replies

I think GCD is perfectly reasonable. I see a couple of fundamental architectural problems in your design.


1) You have a "while true" loop in the main thread? How does you app even work?

2) You seem to be sharing a single state between render queue and the main queue. If you have multiple, long-running threads of execution, it would be better to adopt a producer/consumer model. Instead of trying to sync a shared state, just push data from one to the other. The sync queue would still be used to manage the worker queue. (By worker "queue", I mean a conceptual data queue rather than a GCD structure.)

3) An even better idea would be to use a GCD queue as a conceptual queue. Instead of having two long-running tasks, you have one "simulation.update" task that is triggered asynchronously in response to some event. Each time it is called, it pushes data via async dispatch to the render queue.

4) And you probably shouldn't be using all user-initiated priorities.

First up, I want to reference my post (ironically, this is back on Swift Forums :-) explaining why it’s not the best idea to use Dispatch to run a block and then have that block run indefinitely. In general, it’s better to use a thread for that.

Second, I have to ask more about your goals here. Most graphics applications are tied to a frame rate, that is, they want to be able to update the simulation and then render the simulation N times a second. The outline you’ve proposed doesn’t show any sign of that. Your simulation update code runs as fast as possible.

Now, there are some cases where it makes sense to run your simulation as fast as possible — for example, if you’re rendering the simulation to a movie — but if you’re rendering the simulation to the screen then you pretty much always want to tie both the simulation and and the render to the frame rate.

Can you explain more about your goals here?

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"

> 1) You have a "while true" loop in the main thread? How does you app even work?


So this is a bit of a toy example, because I am mostly trying to prove the concept at this point, but it does indeed work. The app is using GLFW which polls inputs on the main thread, and GLFW is actually designed to work on a tight loop like this. I actually would like to have the update queue operate on a timer instead, but that's a different topic.


> 2) You seem to be sharing a single state between render queue and the main queue. If you have multiple, long-running threads of execution, it would be better to adopt a producer/consumer model. Instead of trying to sync a shared state, just push data from one to the other. The sync queue would still be used to manage the worker queue. (By worker "queue", I mean a conceptual data queue rather than a GCD structure.)


Interesting idea. When you say "push data from one to the other" - can you go into a little more depth on how that would work?


I think the synching is already quite minimal here though: currentState a small struct which is ummutable during update, and is copied by the render queue, so the sync is only required to guarantee that the swap on the update thread is not happening at the same time as the copy on the render thread.


> 4) And you probably shouldn't be using all user-initiated priorities.


What would be the correct priority in this case? It's not very well documented what these priorities mean or what is guaranteed by them from what I could find.

> First up, I want to reference my post (ironically, this is back on Swift Forums :-) explaining why it’s not the best idea to use Dispatch to run a block and then have that block run indefinitely. In general, it’s better to use a thread for that.


Thanks for the link. I notice you are using `sleep` here to time the loop: I've read elsewhere that `sleep` is less than idea for applications like this, because it's not guaranteed that the sceduler will not sleep for longer than the supplied interval. Is this the best method for executing code on a very precise interval, or are there better alternatives?


> Most graphics applications are tied to a frame rate, that is, they want to be able to update the simulation and then render the simulation N times a second. The outline you’ve proposed doesn’t show any sign of that. Your simulation update code runs as fast as possible.


So this detail is not revealed in the sample above, but within the update method it's actually putting that thread to sleep so tht it is rate-limited.


As far as my goals: the main purpose of this exercise is actually to experiment with the extent to which I can decouple rendering from update code. I.e. in the case that the rendering is GPU bound and cannot execute reliably within one timestep, I would like to see if I can keep the input polling and simulation executing at regular intervals on a separate thread.

That was more of a rhetorical question. Even if you have some other library doing something funky, don't put a spin loop in the main queue.


When you have that shared state, you have two threads running in lock-step, somewhat. Actually, you could also be dropping updates. The producer/consumer model is a basic multithreading architecture. I don't know if it would be appropriate for your app. But it does solve this particluar problem. You have one source that is generating data in discrete chunks to be worked on. Then you have a sink that is consuming those chunks and doing the necessary work. Your simulator is generating simulated events and the renderer is consuming them.


Here is the documentation for dispatch quality of service: https://developer.apple.com/library/archive/documentation/Performance/Conceptual/EnergyGuide-iOS/PrioritizeWorkWithQoS.html


Generally, it is better to develop using lower priorities. That way, if there is a performance problem and you need to re-do your entire architecture, you will notice it and fix it before end users see.


From your other reply you state that you are doing both a spin loop and a sleep in the main thread. Don't do either.


Now that I look through this again, I think you are generally on the right track. You aren't generating "events", you are generating "scenes". It is OK to drop scenes in this context. I think you just need to get all that off the main queue. Run your simulation in its own queue. You still don't want to have "while true". You want to be able to gracefully stop it, if necessary. The sync queue is fine. I'm not familiar with graphics software, but when I wrote real-time simulators, I would fix the simulator (producer) rate at twice the "render"(consumer) rate so that everything ran smoothly.

if we take in consideration the way people schedule work on Metal by using a triple buffering in order to use efficiently CPU and GPU, and using GCD for multi threading, I doubt that the bottleneck is between pthread and GCD. More about data tansfer to the GPU.


take a look here : https://developer.apple.com/documentation/metal/advanced_command_setup/cpu_and_gpu_synchronization

Is this the best method for executing code on a very precise interval … ?

Definitely not. My post only uses

sleep
because that’s what the originator of that thread used. If you want to hit specific deadlines you definitely wouldn’t use
sleep
.

As to what you would use, that very much depends on the context in which the code is running. For example, the best practice for real-time audio rendering code is very different from that of a game main loop.

As far as my goals: the main purpose of this exercise is actually to experiment with the extent to which I can decouple rendering from update code. I.e. in the case that the rendering is GPU bound and cannot execute reliably within one timestep, I would like to see if I can keep the input polling and simulation executing at regular intervals on a separate thread.

One of the golden rules of priority schemes is that in order for someone to win, someone else has to lose. If you want to guarantee that your simulation code hits its real-time goal, you have give it a higher priority then your rendering code. There’s no problem doing that, but you have to understand the consequences. Your simulation code may take cycles away from your rendering code, making things glitch more. That in and of itself might not be a problem, but you may see secondary effects. For example, a game with glitch rendering may be hard to control because you don’t get immediate feedback.

Share and Enjoy

Quinn “The Eskimo!”
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

let myEmail = "eskimo" + "1" + "@apple.com"