Someone in labs suggested I post here.
I have this shader, which is derived from an old apple sample project. I can post my updated project if helpful, it's also attached to FB7741493.
Xcode gives me the following remark:
What is this remark trying to warn me about? Floats are 4 bytes and aligned to 4 bytes. This is a very simple access pattern.
Is this the best practice to lay out data for this kind of computation, or how could it be improved? I also tried using texture buffers (don't have this remark, but not clearly faster and often slower), and providing the input data in a [[stage_in]] (shows a similar remark).
I have this shader, which is derived from an old apple sample project. I can post my updated project if helpful, it's also attached to FB7741493.
Code Block metal kernel void add_arrays(constant const float* inA, constant const float* inB, device float* result, uint index [[thread_position_in_grid]]) { //work around "prevented device address mode store/load" int index_signed = index; result[index_signed] = inA[index_signed] + inB[index_signed]; }
Xcode gives me the following remark:
Code Block Buffer Preloading Failed Make sure your data size is a multiple of 4 bytes and aligned to 4 bytes and try using a simple access pattern. For constant buffers, try using a fixed buffer size. inA could not be promoted inB could not be promoted result could not be promoted
What is this remark trying to warn me about? Floats are 4 bytes and aligned to 4 bytes. This is a very simple access pattern.
Is this the best practice to lay out data for this kind of computation, or how could it be improved? I also tried using texture buffers (don't have this remark, but not clearly faster and often slower), and providing the input data in a [[stage_in]] (shows a similar remark).