caijohn’s Profile | Apple Developer Forums

How to load half2 vectors from thread group memory faster?

Hello, I'm trying to optimize code of loading half2 vectors from thread group(or constant) memory, for example, //option A, read once(?) and then unpack #define load_4half2(x, y, z, w, p, i) do{ uint4 readU4 = * ((threadgroup uint4* )(p+i)); x = as_type(readU4.x); y = as_type(readU4.y); z = as_type(readU4.z); w = as_type(readU4.w); }while(0) //option B, read one by one #define load_4half2(x, y, z, w, p, i) do{ threadgroup half2* readU4 = ((threadgroup half2*)(p+i)); x = readU4[0]; y = readU4[1]; z = readU4[2]; w = readU4[3]; }while(0) I haven't figure out how to get "disassembled" code, thus I'm confused which is best solution for this problem. Could anyone kindly help to shed some lights on this? Thanks a lot!

Graphics & Games General Metal

272

Sep ’23

Texture Write Rounding

Hello, I used outTexture.write(half4(hx,0,0,0),uint2(x, y)) to write pixel value to texture and then read back by blitEncoder copyFromTexture to a MTLBuffer, but the integer value read from MTLBUffer is not as expected, for half value which less than 128/256, I got expected value. but got small value with half value huge than 128/256, for examples, 127.0/256; ==> 127 128.0/256; ==> 128 129.0/256; ==> 129 130.0/256; ==> 130 131.0/256; ==> 131 Any thoughts? Thanks Caijohn

Graphics & Games General Metal

416

Sep ’23

caijohn

Post

Replies

Boosts

Views

Activity