Another thing I'd like to see in MPS is support for encoding into indirect compute commands. I recently thought of plans for how to add a Metal backend to DL4S, a deep learning framework for Swift. This requires commands to be dispatched semi-eagerly, where you can't pre-compile them into graphs like with MPSGraph. Being able to utilize indirect command buffers in a JIT compiler like XLA (tensorflow.org/xla) would provide opportunities to reduce encoding overhead.
This isn't encouraged by Apple, but I found a way to load the raw MPS shaders by peering into a private Metallib directory accessible from public APIs. I'll go into as little detail as possible for obvious reasons, but it was possible to create compute pipeline states from MPS shaders. If I had studied them longer, I could have made an indirect command buffer workflow using them. However, there are numerous details about MPS's internals that I don't know, so I might accidentally do something unsafe. The reason I'm saying this is because it proves the MPS team can theoretically pull this off - they just need to expose a safe public API for it. There is also a precedent for unique features geared toward rare performance use cases - MTLCommandQueue.makeCommandBufferWithUnretainedReferences()
.
I ended up scrapping plans for ICBs in because I would need entirely custom shaders to securely execute GPU work, and Apple's MMX shaders far outperformed mine. With that restriction gone, I readily changed my plans to use MPS. For more context on how this played out, you can check out some of the closed issues under the DL4S Evolution repository. I later shifted my efforts to Swift for TensorFlow, so that repo shouldn't experience major updates in the future.
I'm debating whether I should jump-start MetalFFT now, while I wait for the S4TF project to gain momentum in the Swift community (also to help out @CaptainHarlock). I would structure its API similarly to MPS, but you need to input either a MTLComputeCommandEncoder
or a MTLIndirectComputeCommand
instead of kernel.encode(commandBuffer:, ...)
. Perhaps the completion of MetalFFT will help the MPS team better understand my suggestion about ICBs. To the Graphics and Games Engineer responding to this post - could you route the info about MetalFFT and ICBs to the MPS team?