in Metal, how to sync calculate result in kernel function?

I want use Metal kernel function to calculate histogram of a video frame(texture)

some code in kernel function like this
Code Block
float gray = grayScale(rgbValue);
int histogramIndex = int(clamp(gray, 0.0, 1.0) * 255);
histogram[histogramIndex] += 1;


it works, but there was some mistake, I sum up every item value in array histogram, the sum result is not equals to my texture pixel count. it's always less than total pixel count. so I think there must be a multi-thread sync problem.

so how could I sync the kernel function?

I know MPS provided a MPSImageHistogram to do this, It works well, I just want to calculate histogram on my pipeline.

Thanks
Your histogram buffer needs to use atomic device memory to avoid inter-thread race conditions. How have you defined histogram?

I define my histogram buffer by using a Int32 array, and convert it to a MTLBuffer. I found some atomic code to slove my problem, some code like this.

Code Block
kernel void myCalculateHistogram(
texture2d<half, access::read> texture [[texture(0)]],
                uint2 coordinate [[thread_position_in_grid]],
                device  atomic_uint* histogram [[buffer(1)]]) {
  if (coordinate.x >= texture.get_width() || coordinate.y >= texture.get_height()) {
    return;
  }
   
  half4 colorValue = texture.read(coordinate);
  if (configs[14] == 1) {
    half gray = grayScale(colorValue);
    int histogramIndex = int(clamp(gray, half(0.0), half(1.0)) * 255);
    atomic_fetch_add_explicit(&histogram[histogramIndex], 1, memory_order_relaxed);
  }
}


That works like charm. :D
in Metal, how to sync calculate result in kernel function?
 
 
Q