Buffer preloading failed

Someone in labs suggested I post here.

I have this shader, which is derived from an old apple sample project. I can post my updated project if helpful, it's also attached to FB7741493.

Code Block metal
kernel void add_arrays(constant const float* inA,
                       constant const float* inB,
                       device float* result,
                       uint index [[thread_position_in_grid]]) {
    //work around "prevented device address mode store/load"
    int index_signed = index;
    result[index_signed] = inA[index_signed] + inB[index_signed];
}


Xcode gives me the following remark:

Code Block
Buffer Preloading Failed
Make sure your data size is a multiple of 4 bytes and aligned to 4 bytes and try using a simple access pattern. For constant buffers, try using a fixed buffer size.
inA could not be promoted
inB could not be promoted
result could not be promoted


What is this remark trying to warn me about? Floats are 4 bytes and aligned to 4 bytes. This is a very simple access pattern.

Is this the best practice to lay out data for this kind of computation, or how could it be improved? I also tried using texture buffers (don't have this remark, but not clearly faster and often slower), and providing the input data in a [[stage_in]] (shows a similar remark).
Thanks for bringing this to our attention. The remark indicates that the content of the buffer cannot be preloaded, likely because there isn’t enough information on the shader side to tell how much data will be accessed. Note that these remarks depend on the hardware, so they may be different in newer SOCs. Also please note that these remarks do not necessarily indicate a performance issue, and that preloading may not be appropriate for all buffers, such as in your case. We are looking into improving this as part of the enhancement request we received.
Buffer preloading failed
 
 
Q