I have been trying to learn Metal the past couple of months. In the process, I came across an examples and articles of Sorting Networks and decided to try and implement them in Metal.
Now the problem is, if I run the code on my Mac. Everything is fine. But if I run the the same code on my iDevice (iPadPro wLIDAR), I get all sort of errors I do not understand or sorted data is corrupted and all wrong.
Typical Error
Code Block 2021-02-17 12:13:11.218394-0500 METAL_ISSUE[97650:6709092] [GPUDebug] Invalid device load executing kernel function "bitonicSort" encoder: "0", dispatch: 0, at offset 384 file:///Users/staque/Development/OTHER/METAL_ISSUE/METAL_ISSUE/Shaders.metal:77:40 - bitonicSort() <MTLBufferArgument: 0x28006d200> Name = floats Type = MTLArgumentTypeBuffer Access = MTLArgumentAccessReadWrite LocationIndex = 0 IsActive = 1 ArrayLength = 1 TypeInfo = DataType = MTLDataTypePointer ElementType = MTLDataTypeFloat Access = MTLArgumentAccessReadWrite Alignment = 4 DataSize = 4 Alignment = 4 DataSize = 4 DataType = MTLDataTypeFloat buffer: "<unknown>"
You can pretty much drop these in the default Xcode Metal Game default app.
Shader (slightly modified to track the indexes of the floats.)
Code Block /* [Using Code based off of this](https://github.com/tgymnich/MetalSort) Rewritten to make it more understandable. */ kernel void bitonicSort(device float *floats [[ buffer(0) ]], device int *uInts [[ buffer(1) ]], constant int &p [[ buffer(2) ]], constant int &q [[ buffer(3) ]], uint gid [[ thread_position_in_grid ]]) { int pMinusQ = p-q; int distance = 1 << pMinusQ; uint gidShiftedByP = gid >> p; // True: Increasing / False: Descreasing bool direction = (gidShiftedByP & 2) == 0; uint gidDistance = (gid & distance); bool isGidDistanceZero = (gidDistance == 0); uint gidPlusDistance = (gid | distance); bool isLowerIndexGreaterThanHigher = (floats[gid] > floats[gidPlusDistance]); if (isGidDistanceZero && isLowerIndexGreaterThanHigher == direction) { float temp = floats[gid]; floats[gid] = floats[gidPlusDistance]; floats[gidPlusDistance] = temp; int temp2 = uInts[gid]; uInts[gid] = uInts[gidPlusDistance] uInts[gidPlusDistance] = temp2; } }
The call.
Code Block language func runSort() { let device = MTLCreateSystemDefaultDevice()! let commandQueue = device.makeCommandQueue()! let library = device.makeDefaultLibrary()! let sortFunction = library.makeFunction(name: "bitonicSort")! let pipeline = try! device.makeComputePipelineState(function: sortFunction) let setRange = 0..<1024 var floatData = [Float]() var uintData = [UInt32]() // Build the Float and index data backward to form worst case scenerio for sorting. for value in stride(from: Float(setRange.upperBound-1), to: Float(setRange.lowerBound-1), by: -1.0) { floatData.append(value) } for value in stride(from: setRange.upperBound-1, to: setRange.lowerBound-1, by: -1) { uintData.append(UInt32(value)) } print(floatData) print("") print(uintData) guard let logn = Int(exactly: log2(Double(floatData.count))) else { fatalError("data.count is not a power of 2") } for p in 0..<logn { for q in 0..<p+1 { let floatDataBuffer = device.makeBuffer(bytes: &floatData, length: MemoryLayout<Float>.stride * floatData.count, options: [.storageModeShared])! floatDataBuffer.label = "floatDataBuffer" let uintDataBuffer = device.makeBuffer(bytes: &uintData, length: MemoryLayout<UInt32>.stride * uintData.count, options: [.storageModeShared])! uintDataBuffer.label = "uintDataBuffer" let threadgroupsPerGrid = MTLSize(width: floatData.count, height: 1, depth: 1) let threadsPerThreadgroup = MTLSize(width: pipeline.threadExecutionWidth, height: 1, depth: 1) var n1 = p var n2 = q let commandBuffer = commandQueue.makeCommandBuffer()! let encoder = commandBuffer.makeComputeCommandEncoder()! encoder.setComputePipelineState(pipeline) encoder.setBuffer(floatDataBuffer, offset: 0, index: 0) encoder.setBuffer(uintDataBuffer, offset: 0, index: 1) encoder.setBytes(&n1, length: MemoryLayout<Float>.stride, index: 2) encoder.setBytes(&n2, length: MemoryLayout<UInt32>.stride, index: 3) encoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup) encoder.endEncoding() commandBuffer.commit() commandBuffer.waitUntilCompleted() let dataPointer = floatDataBuffer.contents().assumingMemoryBound(to: Float.self) let dataBufferPointer = UnsafeMutableBufferPointer(start: dataPointer, count: floatData.count) floatData = Array.init(dataBufferPointer) let dataPointer2 = uintDataBuffer.contents().assumingMemoryBound(to: UInt32.self) let dataBufferPointer2 = UnsafeMutableBufferPointer(start: dataPointer2, count: uintData.count) uintData = Array.init(dataBufferPointer2) } } print(floatData) print("") print(uintData) }
If anyone has a clue what I should be doing I am all ears, because I need help.
Thanks in advance.
Stan