MTLHeap in concurrent environments.

I'm looking for some help/advice on how MTLHeap behaves in situations where multiple threads are building command buffers.


It's not really clear from the description of MTLHeap when storage is allocated. The sample code is mostly written as though the heap is being accessed from a single thread.


I've sketched out some pseudo code below that shows what I'm getting at. I create 2 command buffers and want to fill them from 2 independent threads. I enqueue the two command buffers and so I know they're going to be executed serially. There are no resources shared between cb1 and cb2. At the point cb1 finishes, I expect the heap to be empty.



    let heap = device.makeHeap(...)


    let cb1 = commandQueue.makeCommandBuffer()!
    let cb2 = commandQueue.makeCommandBuffer()!


    cb1.enqueue()
    cb2.enqueue()


    DispatchQueue.global(...).async {
        let tex1 = heap.makeTexture(..)
        tex1.makeAliasable()
        cb1.commit()
    }


    DispatchQueue.global(...).async {
        let tex2 = heap.makeTexture(..)
        tex2.makeAliasable()
        cb2.commit()
    }



Imagine that the 2 calls to MTLHeap.makeTexture (lines 13 & 20) are executed concurrently.


It's not really clear from the description of MTLHeap *when* storage is allocated.


Does MTLHeap (1) do allocation at the point MTLHeap.makeTexture is called? Or (2) does it effectively assign storage when the command buffer is commited/executed?


In the first case it implies the heap would need storage for both tex1 and tex2 concurrently. However the execution of the 2 command buffers would not need that amount of memory. The heap might need to be overcommited significantly to guarantee there are no problems.


The second case would make more sense and do as expected but I don't see any hints in the docs that this happens.


Thanks.

Replies

Storage for resources created from a heap is allocated when you create the heap. When you call makeTexture you're only saying you're going to use that memory for the texture. The difference being that allocation reserves the memory preventing that memory from being used by other alloactions. When you create an object from the heap, your saying that the object should use some memory from the heap, but other objects could also use that same memory.


So when you create an object from a heap, it's up to you to ensure that multiple objects using the same memory behave nicely with each other. This means you need to explicitly add commands to A) signal when one object is done with memory in a heap and B) wait before another object uses that memory. If both objects are in a single command buffer you can use a MTLFence object for these signal/wait commands. However, if objects are used in separate command buffers, as in your example, you'll need to use an MTLEvent object (which is newly avaliable in iOS 12 and macOS 10.14, see this sample showing how to use events).

Hi Dan.


Thanks for the response. I think we both have the same view of how heaps work but there were a few points that I think I'd like to emphasize. Unless I'm missing something important or some other part of the API, I don't think Metal is really able to satisfy what I want to do.


In my example I think, yes, I may need to add some kind of synchronization so that nothing in cb2 executes until cb1 has completed so let's assume I've done that. That code is a bit lengthy to write out so I'm just going to add a comment where it would go.


But back up and assume I run the code below. It's got a few of the blanks filled in, but mainly it runs serially:


    // get a heap with just enough to allocate a single 256x256xRGBA texture (I know we may need
    // to query the API for alignment in a real app, but for simplicity....)
    let heapDescriptor = MTLHeapDescriptor()
    heapDescriptor.size = 256 * 1024

    let heap = device.makeHeap(heapDescriptor)

    // here's our 256x256xRGBA texture
    let texd = MTLTextureDescriptor.texture2DDescriptor(pixelFormat:.bgra8Unorm, width:256, height:256)

    cb1.enqueue()
    cb2.enqueue()

    let tex1 = heap.makeTexture(texd)
    tex1.makeAliasable()
    // use tex1
    cb1.commit()

    ...let's insert an event to wait for cb1 to complete...
    let tex2 = heap.makeTexture(texd)
    tex2.makeAliasable()
    // use tex2
    cb2.commit()


So I would expect tex1 and tex2 to share the same physical memory location (we may not know the address but we'd expect both allocations to be successful). The heap is only just large enough to satisfy one allocation until it's made aliasable.


The problem comes about when we want to build the command buffers in parallel but rely on our knowledge (a) the command buffers are executed serially and (b) that the heap is essentially 'empty' after a command buffer completes:


    cb1.enqueue()
    cb2.enqueue()

    DispatchQueue.global(...).async {
        let tex1 = heap.makeTexture(texd)
        tex1.makeAliasable()
        // use tex1
        cb1.commit()
    }

    DispatchQueue.global(...).async {
        let tex2 = heap.makeTexture(texd)
        tex2.makeAliasable()
        ...let's insert an event to wait for cb1 to complete...
        // use tex2
        cb2.commit()
    }


In that code one of the allocations will fail (depending on various race behaviour) but essentially the heap is not large enough to satisfy both allocations simultaneously at the point we are building the command buffers. However the command buffers are not executed in parallel. The allocations don't need to exist at the same time when executing, we need some way to express the concept to Metal.


Obviously this is just simple toy code. In a real app the heap might be large (100s of MB) and to build N command buffers in parallel and guarantee that both complete successfully requires that heap be overcommited by N times the amount it really needs. Or more realistically we take a performance hit and serialize command buffer building, this defeats the point of parallel command encoding (which is a really useful feature!).


In Vulkan we would have a custom allocator that basically starts off in an empty state in both async blocks (using the same, single device allocation). MTLBuffer almost has a mechanism for custom allocators like Vulkan in the form of makeTexture(descriptor:offset:bytesPerRow:). However MTLBuffer.makeTexture has a lot of restrictions (no mipmaps, no texture arrays, no depth/stencil, no render targets) and it becomes a no go.


The only way I can get Metal to do what we need is something like below, but it's just not a very friendly pattern for an app to adhere to. It requires all allocations are made from the heap within a serialized section of code. In a real app you have to walk the render graph once to discover and make allocations from a heap, stash the results somewhere, associating allocated resources with various render nodes, finally walk the graph again to build command encoders using those resources. It's just very awkward.


   DispatchQueue.global(...).async {
        heapMutex.lock()
        let tex1 = heap.makeTexture(texd)
        tex1.makeAliasable()
        heapMutex.unlock()

        // use tex1

        cb1.commit()
    }

    DispatchQueue.global(...).async {
        heapMutex.lock()
        let tex2 = heap.makeTexture(texd)
        tex2.makeAliasable()
        heapMutex.unlock()

        // use tex2

        cb2.commit()
    }


To me it seems there's some room for improvement here. Either MTLBuffer.makeTexture becomes a lot more flexible so that developers can write custom allocators like Vulkan or there should be way to allocate resources from a heap that are 'local to a command buffer' or 'local to some instance of a heap' (I can't think of better language to express it)


Thanks!

"In that code one of the allocations will fail (depending on various race behaviour) but essentially the heap is not large enough to satisfy both allocations simultaneously at the point we are building the command buffers."


If you use an MTLEvent to indicate to signal when cb1 is done with tex1 (i.e. call -[MTLEvent encodeSignalEvent:value:]) and then wait with that event with the same value before cb2 creates tex2 (l -[MTLEvent encodeWaitForEvent:value:]), the allocation for tex2 will succeed.


"MTLBuffer.makeTexture has a lot of restrictions (no mipmaps, no texture arrays, no depth/stencil, no render targets) and it becomes a no go."


This is not accurate. You can definitely create a texture with mipmaps, a texture array, or a depth stencil pixel format from a heap.

Again, thanks for the response.


1. I was actually talking about using a MTLBuffer (not MTLHeap) and dishing out allocations for textures from it using a custom allocator (like a cheap Vulkan clone). However MTLBuffer.makeTexture won't allow mipmapped textures, render targets or any of the restrictions in the documentation here.


2. I'm trying to see how an event would help so I wrote out all the code and ran it just in case I was missing something. But it's not making sense to me.


The point of this exercise is to build command buffers in parallel on multiple CPU cores.


The problem is that MTLHeap doesn't defer allocations until the command buffer is submitted (e.g. once it knows the execution order). MTLHeap is doing all allocations on the spot as we call makeTexture. Therefore when 2 threads call makeTexture the heap will require enough space to satisfy both allocations simultanously, whereas the dynamic behaviour of execution only requires 1 texture to be live at once.


So while I admit I can make the code work by doubling the size of the heap (which is what we actually do), it's just that it's not a desireable outcome. On Vulkan we don't have this problem because we're in control of the heap allocations ourselves and can make use of our execution patterns to optimize the allocations accoordingly.


My point being that Metal has a missing use case that's not being satisfied.



func test() {
  let device = MTLCreateSystemDefaultDevice()!
  let queue = device.makeCommandQueue()!
  let cb1 = queue.makeCommandBuffer()!
  let cb2 = queue.makeCommandBuffer()!


  let heapd = MTLHeapDescriptor()
  heapd.size = 256 * 1024
  heapd.storageMode = .private

  let heap = device.makeHeap(descriptor: heapd)!

  let texd = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .bgra8Unorm, width: 256, height: 256, mipmapped: false)
  texd.storageMode = .private

  cb1.enqueue()
  cb2.enqueue()

  let group = DispatchGroup()
  group.enter()
  group.enter()

  let event = device.makeEvent()!

  DispatchQueue.global(qos: .userInitiated).async {
    let tex1 = heap.makeTexture(descriptor:texd)

    // Bug in Swift bindings to Metal. ObjC result is actually optional but swift bindings don't reflect that.
    // We can get a nil on failure and swift complains about comparing to nil but the precondition will fail.
    precondition(tex1 != nil)

    // Simulate a large workload on this thread
    sleep(12)

    tex1.makeAliasable()

    cb1.encodeSignalEvent(event, value: 1)
    cb1.commit()
    group.leave()
  }


  DispatchQueue.global(qos: .userInitiated).async {
    cb2.encodeWaitForEvent(event, value: 1)

    let tex2 = heap.makeTexture(descriptor:texd)

    // Bug in Swift bindings to Metal. ObjC result is actually optional but swift bindings don't reflect that.
    // We can get a nil on failure and swift complains about comparing to nil but the precondition will fail.
    precondition(tex2 != nil)

    // Simulate a large workload on this thread
    sleep(12)

    tex2.makeAliasable()

    cb2.commit()
    group.leave()
  }

  group.wait()

  print("done")
}

When you call make aliasable you're actually saying that the memory can immediately be used by another allocation from the heap. So, assuming, tex1 and tex2 otherwise fit within the heap, both calls to heap.makeTexture should succeed. You should not need to double the size of the heap.