I'm running into an issue with threadgroup memory where data written to it seemingly gets lost when I use int8_t or int16_t element types:
#include <metal_stdlib>
using namespace metal;
kernel void kernel_function(device int16_t* R, uint index [[thread_position_in_threadgroup]]) {
threadgroup int16_t shared[1];
shared[index] = (int16_t) 42;
threadgroup_barrier(mem_flags::mem_threadgroup);
R[0] = shared[index];
}
If I execute this kernel (using the following host code: https://gist.github.com/maleadt/ffcda8fc94f03f32347c3167ccca70a8 ), I get zeros in my output buffer. If I change the element type from int16_t to int32_t (just find/replace in the kernel and host code) I get the expected results.
I'm new to Metal, so I guess I'm doing something wrong here. I'm using an M1 Pro on Monterey, with Xcode 13.4.1.
EDIT: interestingly, running under MTL_SHADER_VALIDATION=1
results in the expected output, so this does start to look like a miscompilation in the back-end.