Post

Replies

Boosts

Views

Activity

Div calculation issue in metal
Hi, all. I've been writing various computational functions using Metal. However, in the following operation functions, unlike + and *, there is an accuracy issue in the / operation. This is a function that divides a matrix of shape [n, x, y] and a scalar [1]. When compared to numpy or torch, if I change the operator of the above function to * or + instead of /, I can get completely the same results, but in the case of /, there is a difference in the mean of more than 1e-5. (For reference, this was written with reference to the metal kernel code in llama.cpp) kernel void kernel_div_single_f16( device const half * src0, device const half * src1, device half * dst, constant int64_t & ne00, constant int64_t & ne01, constant int64_t & ne02, constant int64_t & ne03, uint3 tgpig[[threadgroup_position_in_grid]], uint3 tpitg[[thread_position_in_threadgroup]], uint3 ntg[[threads_per_threadgroup]]) { const int64_t i03 = tgpig.z; const int64_t i02 = tgpig.y; const int64_t i01 = tgpig.x; const uint offset = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00; for (int i0 = tpitg.x; i0 < ne00; i0 += ntg.x) { dst[offset + i0] = src0[offset+i0] / *src1; } } My mac book is, Macbork Pro(16, 2021) / macOS 12.5 / Apple M1 Pro. Are there any issues related to Div? Thanks in advance for your reply.
1
0
426
Jul ’24