Thanks for mmmetal and roserg! I did misunderstanding the SIMDgroup usage, you are right.
By the way, Anyone who intent to use this feature can refer in the implementation in TF-Lite:
https://github.com/alpa-projects/tensorflow-alpa/blob/ee8f6612b515ada4509fa53491c5ba5b3ef8524a/tensorflow/lite/delegates/gpu/common/tasks/conv_metal_simd.cc
Post
Replies
Boosts
Views
Activity
The bug is fixed.
Reason: Use untracked MTLHeaps, but not using MTLFence to protect the GPU could execute both filters in parallel, and thus read uninitialized dynamic texture data allocated from the heap.
Solution1: set MTLFence, see https://developer.apple.com/documentation/metal/synchronization/implementing_a_multistage_image_filter_using_heaps_and_events
Solution2: turn untracked MTLHeap to tracked MTLHeap, that it is, let heapDescriptor.hazarTrackingMode = MTLHazardTrackingModeTracked, see https://developer.apple.com/documentation/metal/mtlheapdescriptor/3131686-hazardtrackingmode