Hi, all.
I've been writing various computational functions using Metal.
However, in the following operation functions, unlike + and *, there is an accuracy issue in the / operation.
This is a function that divides a matrix of shape [n, x, y] and a scalar [1].
When compared to numpy or torch, if I change the operator of the above function to * or + instead of /, I can get completely the same results, but in the case of /, there is a difference in the mean of more than 1e-5.
(For reference, this was written with reference to the metal kernel code in llama.cpp)
kernel void kernel_div_single_f16(
device const half * src0,
device const half * src1,
device half * dst,
constant int64_t & ne00,
constant int64_t & ne01,
constant int64_t & ne02,
constant int64_t & ne03,
uint3 tgpig[[threadgroup_position_in_grid]],
uint3 tpitg[[thread_position_in_threadgroup]],
uint3 ntg[[threads_per_threadgroup]]) {
const int64_t i03 = tgpig.z;
const int64_t i02 = tgpig.y;
const int64_t i01 = tgpig.x;
const uint offset = i03*ne02*ne01*ne00 + i02*ne01*ne00 + i01*ne00;
for (int i0 = tpitg.x; i0 < ne00; i0 += ntg.x) {
dst[offset + i0] = src0[offset+i0] / *src1;
}
}
My mac book is,
Macbork Pro(16, 2021) / macOS 12.5 / Apple M1 Pro.
Are there any issues related to Div? Thanks in advance for your reply.