I have an M2 Mac Mini with Apple Clang 16.0.0. Under certain circumstances, the SIMD code generated by an unrolled loop is incorrect.
I have a short example program which reproduces the bug, on my machine and someone else's with the same Clang version. The core operation is this:
for (size_t i = 0; i < count; ++i) {
c[i] = a[i]*std::conj(b[i]);
}
This loop gets unrolled to process 4 elements at once, and when count=15
, the first 12 results have the wrong sign for the imaginary part. The final 3 elements are correct, since those are processed in a different code path.
Is this an known error? I suspect it might be present in other Apple Clang versions as well (because I found this while chasing down an extremely unpredictable bug) but so far this is the only setup where I've cleanly reproduced it.
Minimal test program (43 lines): https://signalsmith-audio.co.uk/tmp/argh.git/ - just run make
.
The expected output is a bunch of error=0
, or small values from floating-point errors.
I'm getting results like error=0.229711
, and you can see it's because the "actual" results have a ± error.