Please remember that there 3 GPU families on Mac OS X each with their own unique driver bugs.
Your kernels might work on one device and fail on others. I have seen kernels that crah on 2011 AMD but work fine on 2013 and later AMD GPUs.
2011 AMD GPUs cant handle much thread divergence I have found.
Recommendation:
1) Replace the float2 res variable with say float resx & resy. (Intermixing scalar and vector operations might trigger a compiler bug - a hunch)
2) Your for loop with the break statement is going to cause extreme thread divergence.
Remember that in a thread block typically 16 hardware threads will execute this kernel in lock step fashion.
Rewrite this loop to remove the break like so:
if ( res.x >= precis && t >= tmax ) {
mat = res.y;
t += res.x;
}
Now no more thread divergence. I believe the break statement was the source of your problem.
Also the check for res.x < precis will ALWAYS be false -- look at your code - both are constant values
Looks like the res.x variable can be discarded - just use the constant value in your code.