High Sierra, possible bug with compute kernels, intel HD 5000

I have quite heavy kernels, for a grid of 128x128x128 I have a loop of 32000 (it is a energy grid of a molecular simulation using 32.000 atoms of a crystal).

The paralization is over the grid, not over the atoms, for convenience. This works fine on Sierra and El Capitan, eventhought it takes a couple of seconds to compute.

However, on High Sierra it only computes 1/4, throwing out all results I suspect after a certain time-out. If I reduce the amount of work per kernel it shows 1/2 the grid, and if I reduce it further it shows the whole grid.


My kernels works fine on El Capitan and Sierra, on Intel, NVidia and AMD. It works on High Sierra on NVidia and AMD, but not on Intel HD 5000 (the mac air).


My problem is that I know the results are wrong on intel HD 5000, but no error appears. I check the error of the commandBuffer, and it is nil. Clearly not all kernels have run.

Question: How do you 100% for sure know that all kernels have run?

Replies

Have each kernel write a value (say, 1) to an output buffer such that if all kernels run, the buffer is filled with 1s. Check it CPU-side. That should verify that all kernels have run. Did I understand the question correctly?

Yes, that's correct. The result is (current Xcode 9.0 and High Sierra on intel HD 5000):

1) for a short running kernel you get all 1's.

2) for a longer running kernel the output start with all 1's and the last part of the output-buffer all 0's.

So clearly not all kernels have run (I am speculating they got timed-out because the whole computation is about 15 seconds using a single dispatchThreadgroups on intel HD5000 [vs less than 3 seconds on my macpro Radeon R9 280X]), but there is no time-out error in the commandbuffer. Of course normally you don't know the output (that is why you're computing it in the first place), and the correctness of the compute result depends on the requirement that all kernels have run or that otherwise you get an error to inform the user that something went wrong.


So far I have only noticed this on intel HD 5000 but that might be because it is the slowest GPU I have tested on.


First thing that comes to mind is a memory-access error, but my kernel basically for each thread reads at 'thread_position_in_grid' and writes at 'thread_position_in_grid' and perfoms a for-loop of work. For for-loops less than 8192 iterations the total work is small enough that it works, but for larger amount of iterations it goes wrong. I do see on my mac-air visual artifacts in cocoa after this so that does indicate videocard-memory corruption. But so far I have not been able to figure out what is causing this. However, just splitting the work up in smaller work per thread using multiple dispatchThreadgroups and accumulating the result works fine.