Hello,
I'm trying to optimize code of loading half2 vectors from thread group(or constant) memory, for example,
//option A, read once(?) and then unpack
#define load_4half2(x, y, z, w, p, i) do{
uint4 readU4 = * ((threadgroup uint4* )(p+i));
x = as_type(readU4.x);
y = as_type(readU4.y);
z = as_type(readU4.z);
w = as_type(readU4.w);
}while(0)
//option B, read one by one
#define load_4half2(x, y, z, w, p, i) do{
threadgroup half2* readU4 = ((threadgroup half2*)(p+i));
x = readU4[0];
y = readU4[1];
z = readU4[2];
w = readU4[3];
}while(0)
I haven't figure out how to get "disassembled" code, thus I'm confused which is best solution for this problem. Could anyone kindly help to shed some lights on this?
Thanks a lot!
Post
Replies
Boosts
Views
Activity
Hello,
I used outTexture.write(half4(hx,0,0,0),uint2(x, y)) to write pixel value to texture and then read back by blitEncoder copyFromTexture to a MTLBuffer, but the integer value read from MTLBUffer is not as expected, for half value which less than 128/256, I got expected value. but got small value with half value huge than 128/256, for examples,
127.0/256; ==> 127
128.0/256; ==> 128
129.0/256; ==> 129
130.0/256; ==> 130
131.0/256; ==> 131
Any thoughts?
Thanks
Caijohn