Post

Replies

Boosts

Views

Activity

How to load half2 vectors from thread group memory faster?
Hello, I'm trying to optimize code of loading half2 vectors from thread group(or constant) memory, for example, //option A, read once(?) and then unpack #define load_4half2(x, y, z, w, p, i) do{ uint4 readU4 = * ((threadgroup uint4* )(p+i)); x = as_type(readU4.x); y = as_type(readU4.y); z = as_type(readU4.z); w = as_type(readU4.w); }while(0) //option B, read one by one #define load_4half2(x, y, z, w, p, i) do{ threadgroup half2* readU4 = ((threadgroup half2*)(p+i)); x = readU4[0]; y = readU4[1]; z = readU4[2]; w = readU4[3]; }while(0) I haven't figure out how to get "disassembled" code, thus I'm confused which is best solution for this problem. Could anyone kindly help to shed some lights on this? Thanks a lot!
0
0
272
Sep ’23
Texture Write Rounding
Hello, I used outTexture.write(half4(hx,0,0,0),uint2(x, y)) to write pixel value to texture and then read back by blitEncoder copyFromTexture to a MTLBuffer, but the integer value read from MTLBUffer is not as expected, for half value which less than 128/256, I got expected value. but got small value with half value huge than 128/256, for examples, 127.0/256; ==> 127 128.0/256; ==> 128 129.0/256; ==> 129 130.0/256; ==> 130 131.0/256; ==> 131 Any thoughts? Thanks Caijohn
3
0
415
Sep ’23