Post

Replies

Boosts

Views

Activity

Reply to Unexpected behavior for shared MTLBuffer during CPU work
Currently, since this project is a work-in-progress, only a single execution of the image pipeline executes. During the execution, theMTLCaptureManager captures the execution of the command buffer. There is no loop: it executes exactly once, and its execution is analyzed. Within the execution of the image processing pipeline, this is the only spot where the GPU-CPU synchronization occurs with the shared event. The shared event resource, as well as the other resources in the pipeline, are created before the creation of the command buffer. The resources used in the pipeline are all tracked by Metal (hazardTrackingMode = .tracked) (though I hope to change this in the future and use heaps for more efficiency) Here is a brief overview of how the code is organized: preloadResources() // 1. Let CoreImage render the CGImage into the metal texture let commandBufferDescriptor = /// ... enable `encoderExecutionStatus` to capture errors let ciCommandBuffer = commandQueue..makeCommandBuffer(descriptor: commandBufferDescriptor)          let ciSourceImage = CIImage(cgImage: sourceImage)         ciContext.render(ciSourceImage,                                           to: sourceImageTexture,                                           commandBuffer: ciCommandBuffer,                                           bounds: sourceImageTexture.bounds2D,                                           colorSpace: CGColorSpaceCreateDeviceRGB())         ciCommandBuffer.commit() // 2. Do the rest of the image processing let commandBuffer = commandQueue.makeCommandBuffer(descriptor: commandBufferDescriptor)!         try imageProcessorA.encode(commandBuffer: commandBuffer,                                      sourceTexture: sourceImageTexture,                                      destinationTexture: sourceImageIntermediateTexture)         try imageProcessorA.encode(commandBuffer: curveDetectionCommandBuffer,                                   sourceTexture: sourceImageIntermediateTexture,                                   destinationTexture: destinationImageTexture)         commandBuffer.commit() imageProcessorA contains kernelA and kernelB and performs the synchronization as described above. I suppose I could schedule a technical review session with an engineer to provide more details of the project if more context is needed to resolve the problem.
Mar ’22
Reply to Unexpected behavior for shared MTLBuffer during CPU work
extension MTLCommandBuffer {     func encodeCPUExecution(for sharedEvent: MTLSharedEvent, listener: MTLSharedEventListener, work: @escaping () -> Void) {         let value = sharedEvent.signaledValue         sharedEvent.notify(listener, atValue: value + 1) { event, _ in             work()             event.signaledValue = value + 2         }         encodeSignalEvent(sharedEvent, value: value + 1)         encodeWaitForEvent(sharedEvent, value: value + 2)     } } This is the code for encodeCPUExecution my mistake for not making it clear enough. In fact the GPU does wait on value + 2 as you described, yet the behavior still exists. The issue is that the computation is quite suited for CPU execution (it can actually take advantage of dynamic programming for O(n) time) and is not suited for GPU execution, though I suppose you could have a single thread write the result out in a similar way the CPU does (which is probably more performant even) I would still like to figure out why this behavior exists in the first place, even if the computation is pushed to a single thread on the GPU
Feb ’22
Reply to Storage of `ray_data` in ray tracing payload
Two reasons: Sheer curiosity I was worried that I could run out of tile memory if ray_data payloads were stored there. I was hoping to implement something like Rich Forster mentioned in the associated talk "Get to know Metal function pointers." Around 18:00 is the relevant section. There, he talks about divergence as a result of the different threads invoking different functions. The solution (described around 19:00) was to use threadgroup memory to pass around relevant data, which I thought could be constrained by the size of the payload Of course, I should maybe figure that this would have been mentioned somewhere if it were something to consider and so I don't have to worry, but it's interesting nonetheless. P.S. I haven't written the ray tracing kernel yet, nor the intersection/visible function(s). But it was something I considered as I was designing my program
Apr ’21