Hi Dan.
Thanks for the response. I think we both have the same view of how heaps work but there were a few points that I think I'd like to emphasize. Unless I'm missing something important or some other part of the API, I don't think Metal is really able to satisfy what I want to do.
In my example I think, yes, I may need to add some kind of synchronization so that nothing in cb2 executes until cb1 has completed so let's assume I've done that. That code is a bit lengthy to write out so I'm just going to add a comment where it would go.
But back up and assume I run the code below. It's got a few of the blanks filled in, but mainly it runs serially:
// get a heap with just enough to allocate a single 256x256xRGBA texture (I know we may need
// to query the API for alignment in a real app, but for simplicity....)
let heapDescriptor = MTLHeapDescriptor()
heapDescriptor.size = 256 * 1024
let heap = device.makeHeap(heapDescriptor)
// here's our 256x256xRGBA texture
let texd = MTLTextureDescriptor.texture2DDescriptor(pixelFormat:.bgra8Unorm, width:256, height:256)
cb1.enqueue()
cb2.enqueue()
let tex1 = heap.makeTexture(texd)
tex1.makeAliasable()
// use tex1
cb1.commit()
...let's insert an event to wait for cb1 to complete...
let tex2 = heap.makeTexture(texd)
tex2.makeAliasable()
// use tex2
cb2.commit()
So I would expect tex1 and tex2 to share the same physical memory location (we may not know the address but we'd expect both allocations to be successful). The heap is only just large enough to satisfy one allocation until it's made aliasable.
The problem comes about when we want to build the command buffers in parallel but rely on our knowledge (a) the command buffers are executed serially and (b) that the heap is essentially 'empty' after a command buffer completes:
cb1.enqueue()
cb2.enqueue()
DispatchQueue.global(...).async {
let tex1 = heap.makeTexture(texd)
tex1.makeAliasable()
// use tex1
cb1.commit()
}
DispatchQueue.global(...).async {
let tex2 = heap.makeTexture(texd)
tex2.makeAliasable()
...let's insert an event to wait for cb1 to complete...
// use tex2
cb2.commit()
}
In that code one of the allocations will fail (depending on various race behaviour) but essentially the heap is not large enough to satisfy both allocations simultaneously at the point we are building the command buffers. However the command buffers are not executed in parallel. The allocations don't need to exist at the same time when executing, we need some way to express the concept to Metal.
Obviously this is just simple toy code. In a real app the heap might be large (100s of MB) and to build N command buffers in parallel and guarantee that both complete successfully requires that heap be overcommited by N times the amount it really needs. Or more realistically we take a performance hit and serialize command buffer building, this defeats the point of parallel command encoding (which is a really useful feature!).
In Vulkan we would have a custom allocator that basically starts off in an empty state in both async blocks (using the same, single device allocation). MTLBuffer almost has a mechanism for custom allocators like Vulkan in the form of makeTexture(descriptor:offset:bytesPerRow:). However MTLBuffer.makeTexture has a lot of restrictions (no mipmaps, no texture arrays, no depth/stencil, no render targets) and it becomes a no go.
The only way I can get Metal to do what we need is something like below, but it's just not a very friendly pattern for an app to adhere to. It requires all allocations are made from the heap within a serialized section of code. In a real app you have to walk the render graph once to discover and make allocations from a heap, stash the results somewhere, associating allocated resources with various render nodes, finally walk the graph again to build command encoders using those resources. It's just very awkward.
DispatchQueue.global(...).async {
heapMutex.lock()
let tex1 = heap.makeTexture(texd)
tex1.makeAliasable()
heapMutex.unlock()
// use tex1
cb1.commit()
}
DispatchQueue.global(...).async {
heapMutex.lock()
let tex2 = heap.makeTexture(texd)
tex2.makeAliasable()
heapMutex.unlock()
// use tex2
cb2.commit()
}
To me it seems there's some room for improvement here. Either MTLBuffer.makeTexture becomes a lot more flexible so that developers can write custom allocators like Vulkan or there should be way to allocate resources from a heap that are 'local to a command buffer' or 'local to some instance of a heap' (I can't think of better language to express it)
Thanks!