I have a metal kernel function that has a huge array of data for input, stored in device memory, and I'm basically using one element per thread for further processing.
device Element *elements [[ buffer(0) ]],
I'm wondering what's better in terms of performance? :
Make a copy of the array element into local thread memory :
Element element = elements[thread_id];
Or, use a pointer to that element :
device Element *element = &particles[thread_id];
In most cases, regardless of the approach you will take, Metal compiler will produce optimized code, reducing the number of memory operations and used hardware registers. It is reasonable to expect that you will get very similar performance profile.
However, it is also true that performance of your code will not be determined only by how you are reading the values from input buffers, but also how those values are used in the shader. If you have any reason to believe you are leaving the performance on the table, we recommend to profile your app using GPU counters. It will give you a deep understanding of the code Metal generated for your shader and will let you optimize for specific case.
Counters that you may want to check first are limiter counters, to see if your app is ALU or Buffer Read Limited. For more information on how to use them, please watch this great presentation:
https://developer.apple.com/videos/play/wwdc2020/10603/
However, it is also true that performance of your code will not be determined only by how you are reading the values from input buffers, but also how those values are used in the shader. If you have any reason to believe you are leaving the performance on the table, we recommend to profile your app using GPU counters. It will give you a deep understanding of the code Metal generated for your shader and will let you optimize for specific case.
Counters that you may want to check first are limiter counters, to see if your app is ALU or Buffer Read Limited. For more information on how to use them, please watch this great presentation:
https://developer.apple.com/videos/play/wwdc2020/10603/