Simple Metal apps without buffers and semaphore?

Metal Best Practices Guide states that

The setVertexBytes:length:atIndex: method is the best option for binding a very small amount (less than 4 KB) of dynamic buffer data to a vertex function

I believe this means that in case of simple scenes, instead of storing uniforms in a manually managed dynamic buffer, it's best to simply update model/view/projection matrices without using any buffer at all, by using

setVertexBytes
and
setFragmentBytes
.


My question is, that in this case, as there is no dynamic buffer at all (only static vertex data), what are we calling triple buffering?

Is it simply because we have a semaphore with

value: 3
left now?


Moreover, what if I totally remove the semaphore as well? The app seems to work fine. But what is happening in this case actually? Am I getting better latency or worse, compared to a semaphore with value: 3?


Since the render loop is limited to 60 FPS and the frame time is about 1.5 ms for the CPU (in case of a simple example), some command has to take the place of the blocking semaphore, right? Is Metal going into double-buffering in this case (GPU displays one frame while CPU is encoding the next)?

Accepted Reply

Well, technically there are dynamic buffers. They are created "behind the scenes" for you by Metal. There is no need for any synchronization, because all the buffers are created once with content sent by CPU, then used by GPU once and discarted. Similar technique was called buffer "orphaning" in OpenGL, I believe.


You can get rid of semaphores. There is no "triple" buffering, or any "n" buffering, since there is no fixed number of buffers being re-used. Technically, memory is of course (or you'd run out of it eventually), but not buffers.


Better or worse latency...well, the way I understand it (not sure if 100% correct) is that for small amounts of data (they say 4kb which happens to be page size on at least Intel CPUs) it doesn't matter whether you send only a few bytes, or say 3kb (from CPU to GPU). So instead of sending just a draw command, you pass along associated data, and as long as that data is small, there is no additional cost involved. So this is as fast as it gets.


Renderers usually double (or more) buffer they output, yes. But do not confuse output (render) double buffering (meant to avoid writing to the drawable being displayed, to avoid so called "tearing" artifacts) and upload (from CPU to GPU) many-buffering. It is not the same.


Hope that helps

Michal

Replies

Well, technically there are dynamic buffers. They are created "behind the scenes" for you by Metal. There is no need for any synchronization, because all the buffers are created once with content sent by CPU, then used by GPU once and discarted. Similar technique was called buffer "orphaning" in OpenGL, I believe.


You can get rid of semaphores. There is no "triple" buffering, or any "n" buffering, since there is no fixed number of buffers being re-used. Technically, memory is of course (or you'd run out of it eventually), but not buffers.


Better or worse latency...well, the way I understand it (not sure if 100% correct) is that for small amounts of data (they say 4kb which happens to be page size on at least Intel CPUs) it doesn't matter whether you send only a few bytes, or say 3kb (from CPU to GPU). So instead of sending just a draw command, you pass along associated data, and as long as that data is small, there is no additional cost involved. So this is as fast as it gets.


Renderers usually double (or more) buffer they output, yes. But do not confuse output (render) double buffering (meant to avoid writing to the drawable being displayed, to avoid so called "tearing" artifacts) and upload (from CPU to GPU) many-buffering. It is not the same.


Hope that helps

Michal

OK, I thought that when we have 3 buffers and semaphore with value: 3, we are doing triple buffering. With 2, we have double buffering. But now you are explaining that the GPU has a "hidden" double buffering as well? Or it is just a framebuffer for v-sync?


Anyway, the practical part which I'm interested in, is what strategy would provide the best input-to-photon latency. Say you make the screen white if the user presses a button or touches the screen. What kind of Metal application would give the lowest latency here?

"Since the render loop is limited to 60 FPS and the frame time is about 1.5 ms for the CPU (in case of a simple example), some command has to take the place of the blocking semaphore, right?"


That's right. It is when your code try to get the next drawable the call will block to prevent you from reaching beyond the FPS limit. However, this type of blocking is unpredictable and your code won't have a reliable way to determin how many buffers to circle among. Your code will occasionly run into buffer corruption.

Why would it be unpredictable? Just some data is added to the encoder. There are no buffers, just copying small amounts of data from CPU -> GPU memory.

But I'm only been playing with apps like this, so I have no idea how will it work in production.

Your shader and your encoding code are running asynchrously on GPU and on CPU respectively. You need to use different buffer to encoding for different frames or the encoding code will corrupt what your shader will be reading. Since your buffer can not be inifite, you have to circle through a finite number of buffers. Without proper semaphore, you have no way to decide how you circel through.

If you're using set[Vertex|Fragment]Bytes to send data to you shaders (and thus the GPU) it won't be unpredictable. But if you're using a single buffer with set[Vertex|Fragment]Buffer to send data, when you write to the buffer with your application (i.e. with the CPU) and your shader (i.e. the GPU) reads from that same buffer, if there's no synchronization, that write could occur before or after the GPU has read the data. So there's no telling whether the GPU will read the value that was there before the write or what was after the write.


The CPU and GPU Synchronization Metal sample dicusses this topic.

Just found that if I do not add the last line of the following code, the app may crash, so for a simple usage of Metalkit, blocking until command buffer complete is a good idea.

[renderEncoder endEncoding];

// Schedule a present once the framebuffer is complete using the current drawable

[commandBuffer presentDrawable:self.view.currentDrawable];

[commandBuffer commit];

// block until command buffer complete.

[commandBuffer waitUntilCompleted];