Calling computer kernels multiple times

Question

We are working on a numerical model where we need to call computer kernals repeatedly tens of thousands of times.

What is the best way to do this with metal?

Following is an example of what we currently do. I can't imagine this is the best way.

        for i in 0 ..< 10000000000000 {
           print( "loop: ", i)
            let cmds = queue.makeCommandBuffer()
           
            let encoder = cmds.makeComputeCommandEncoder()
            encoder.setComputePipelineState(ForcePipeline)
            encoder.setBytes(&numberOfParticles, length:MemoryLayout< Int >.stride, at: 0)
            encoder.setBuffer(massBuffer, offset: 0, at: 1)
            encoder.setBuffer(springkBuffer, offset: 0, at:2)
            encoder.setBuffer(radiusBuffer, offset: 0, at:3)
            encoder.setBuffer(positionOldBuffer, offset: 0, at:4)
            encoder.setBuffer(forceBuffer, offset: 0, at: 5)
            encoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
            encoder.endEncoding()
       
            let encoder2 = cmds.makeComputeCommandEncoder()
            encoder2.setComputePipelineState(EulerPipeline)
            encoder2.setBytes(&numberOfParticles, length:MemoryLayout< Int >.stride, at: 0)
            encoder2.setBuffer(massBuffer, offset: 0, at: 1)
            encoder2.setBuffer(forceBuffer, offset: 0, at: 2)
            encoder2.setBuffer(velocityOldBuffer, offset:0, at: 3)
            encoder2.setBuffer(positionOldBuffer, offset:0, at: 4)
            encoder2.setBuffer(velocityNewBuffer, offset:0, at: 5)
            encoder2.setBuffer(positionNewBuffer, offset:0, at: 6)
            encoder2.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
            encoder2.endEncoding()
   
            cmds.commit()
            cmds.waitUntilCompleted()
        }

Metal

602

Posted by

Hustrulid

Reply

Add a Comment

Answer 1

Encode more than two commands into a command buffer. Experiment with the right size. Only call waitUntilCompleted on the last command buffer that you encode.

Posted by

Audulus

Add a Comment

Answer 2

Also, there doesb't appear to be any reason to split up your calls to dispatchThreadgroups into separate enconders.

Try to use a single encoder with a single command buffers. As Audulus said, avoid waitUntilComplete until you actually need data back from your buffers in your app.

Posted by

Graphics and Games Engineer

Add a Comment

Calling computer kernels multiple times

Replies