Calling computer kernels multiple times

We are working on a numerical model where we need to call computer kernals repeatedly tens of thousands of times.


What is the best way to do this with metal?


Following is an example of what we currently do. I can't imagine this is the best way.


        for i in 0 ..< 10000000000000 {
           print( "loop: ", i)
            let cmds = queue.makeCommandBuffer()
           
            let encoder = cmds.makeComputeCommandEncoder()
            encoder.setComputePipelineState(ForcePipeline)
            encoder.setBytes(&numberOfParticles, length:MemoryLayout< Int >.stride, at: 0)
            encoder.setBuffer(massBuffer, offset: 0, at: 1)
            encoder.setBuffer(springkBuffer, offset: 0, at:2)
            encoder.setBuffer(radiusBuffer, offset: 0, at:3)
            encoder.setBuffer(positionOldBuffer, offset: 0, at:4)
            encoder.setBuffer(forceBuffer, offset: 0, at: 5)
            encoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
            encoder.endEncoding()
       
            let encoder2 = cmds.makeComputeCommandEncoder()
            encoder2.setComputePipelineState(EulerPipeline)
            encoder2.setBytes(&numberOfParticles, length:MemoryLayout< Int >.stride, at: 0)
            encoder2.setBuffer(massBuffer, offset: 0, at: 1)
            encoder2.setBuffer(forceBuffer, offset: 0, at: 2)
            encoder2.setBuffer(velocityOldBuffer, offset:0, at: 3)
            encoder2.setBuffer(positionOldBuffer, offset:0, at: 4)
            encoder2.setBuffer(velocityNewBuffer, offset:0, at: 5)
            encoder2.setBuffer(positionNewBuffer, offset:0, at: 6)
            encoder2.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
            encoder2.endEncoding()
   
            cmds.commit()
            cmds.waitUntilCompleted()
        }

Replies

Encode more than two commands into a command buffer. Experiment with the right size. Only call waitUntilCompleted on the last command buffer that you encode.

Also, there doesb't appear to be any reason to split up your calls to dispatchThreadgroups into separate enconders.


Try to use a single encoder with a single command buffers. As Audulus said, avoid waitUntilComplete until you actually need data back from your buffers in your app.