[Metal] Fastest way to copy device data to threadgroup memory?

To optimize buffer read, I intend to use threadgroup memory.

Buf it seems:
(1) There is no API like std::memcpy in MSL;
(2) Also, there is no API like [setBuffer: atIndex:] to set data for threadgroup memory.

The amount of data is 2~4KB. How can I get the fastest way to copy data from device data to threadgroup memory? THX!

Answered by Graphics and Games Engineer in 711479022

The usual pattern is to have the threads collaborate to load the threadgroup memory (e.g. a few bytes copied per thread), then issue a threadgroup barrier.

Accepted Answer

The usual pattern is to have the threads collaborate to load the threadgroup memory (e.g. a few bytes copied per thread), then issue a threadgroup barrier.

[Metal] Fastest way to copy device data to threadgroup memory?
 
 
Q