I'm modifying <1mb of a 256mb managed buffer (calling didModifyRange), but according to Metal System Trace, the GPU copies the whole buffer (SDMA0 channel, "Page On 268435456 bytes"), taking 13ms.
I'm making lots of small modifications (~4k) per frame. I also tried coalescing into a single call to didModifyRange (~66mb) and still the entire buffer is copied. I also tried calling didModifyRange for the first byte, and then the copied data is small.
So I'm wondering why didModifyRange doesn't seem to be efficient for many small updates to a big buffer?
So what you are seeing with this managed buffer is fairly normal. The method didModifyRange
will essentially signal to the driver that the buffer has been changed. If you want something a little better for a discrete GPU, you should use a Private buffer to store your data, and use a blit encoder to update smaller sections. You should see much better performance with that.