Is it need to synchronize CPU and GPU?

I am writing a game engine, and the engine creates a MTLBuffer and update the contents every frame. And the MTLBuffer is used in many draw calls, the codes looks like:


// draw call 1

memcpy(mtlbuffer,.contents + offset1, data, size);

[_mtlRenderEncoder drawIndexedPrimitives:toMTLPrimitive(primitiveType)

indexCount:count

indexType:toMTLIndexType(indexType)

indexBuffer:mtlbuffer

indexBufferOffset:offset1];

// draw call 2

memcpy(mtlbuffer,.contents + offset2, data, size);

[_mtlRenderEncoder drawIndexedPrimitives:toMTLPrimitive(primitiveType)

indexCount:count

indexType:toMTLIndexType(indexType)

indexBuffer:mtlbuffer

indexBufferOffset:offset2];


[_mtlCommandBuffer commit];


So i have two questions:

1. Can use the same mtlbuffer in two draw calls with offset?

2. Should update mtlbuffer after all commands are executed by GPU? If not, then the data will be modified when GPU use the mtlbuffer?


Thanks in advanced.

Accepted Reply

Using the same MTLBuffer with incrementing offsets for each draw calls is the strongly suggested design pattern. One buffer per draw gets expensive quickly.


There are two aspects to consider to ensure correct behavior:


1) Within a frame, you may need to take action to tell Metal that you modified the contents of the buffer, so it an be uploaded to VRAM, assuming you're targeting macOS. (If targeting iOS/tvOS only, you can ignore this). This page describes the choice between allocating a buffer with StorageModeShared or StorageModeManaged. Selecting Shared (system memory allocation only) can be slower on discrete GPUs. Selecting managed requires one additional API call (MTLBuffer.didModifyRange) to inform Metal that you've made a modification and it needs to be uploaded again. This WWDC video also describes the behavior.


2) Neither Metal nor CoreAnimation protect you from having the CPU work overwrite the buffer while the GPU is still executing the previous commands. The document CPU and GPU Synchronization describes this problem in greater detail, and describes how to handle it correctly. There is also a WWDC video on the topic.

Replies

Add more information, the engine invokes [CAMetalLayer nextDrawable] at the begin of rendering a frame, does it do the synchronization work?

Using the same MTLBuffer with incrementing offsets for each draw calls is the strongly suggested design pattern. One buffer per draw gets expensive quickly.


There are two aspects to consider to ensure correct behavior:


1) Within a frame, you may need to take action to tell Metal that you modified the contents of the buffer, so it an be uploaded to VRAM, assuming you're targeting macOS. (If targeting iOS/tvOS only, you can ignore this). This page describes the choice between allocating a buffer with StorageModeShared or StorageModeManaged. Selecting Shared (system memory allocation only) can be slower on discrete GPUs. Selecting managed requires one additional API call (MTLBuffer.didModifyRange) to inform Metal that you've made a modification and it needs to be uploaded again. This WWDC video also describes the behavior.


2) Neither Metal nor CoreAnimation protect you from having the CPU work overwrite the buffer while the GPU is still executing the previous commands. The document CPU and GPU Synchronization describes this problem in greater detail, and describes how to handle it correctly. There is also a WWDC video on the topic.

Thanks Frogblast. It helps a lot.