It looks like on buffers, all of our iOS buffers are allocated to 128K minimum for even small uniform buffers (as reported by MTLBuffer.allocatedSize). device.currentAllocatedSize also reflects this larger total. But on macOS with the same Metal code, the buffers are 4K or less. This made this hard to track down, since it was 4300 small MTLBuffers x 128K = 530MB. Is there some large page table present on iPadOS that is killing us here that isn't the case on macOS (Intel).
So page sizes are 4x what we see on macOS. That does waste memory. It’s 16K, not 128K though. But still if a MTLResource is mapped to a page minimum, then this does waste more space on iOS.
iPadOS
expr (int)getpagesize()
(int) $1 = 16384
macOS
expr (int)getpagesize()
(int) $0 = 4096
Also this is strange, some of our buffers < 128K allocated, but the super small ones always return 128K. It's like Metal is sticking super tiny allocations into a shared buffer, and then we (and gpu capture) are double-counting the memory.
VertexData1 request:0.046 actual:0.047 MB
VertexData0 request:0.000 actual:0.125 MB
VertexData_2 request:0.000 actual:0.125
Post
Replies
Boosts
Views
Activity
In both cases, the capture was enabled on macOS and iOS. But only iOS listed a 500mb increase when the capture was performed in the Memory Viewer of gpu capture. The macOS numbers were reasonable, and what I expected. When I disabled gpu capture, the memory difference reported by Xcode's memory panel (not gpu capture) was only 10MB or so.
We have our own reporting system, so I'll dump the numbers with and without capture enabled below. I was seeing numbers similar to this when vertex buffers weren't de-duped before being added up. That happens when you have sub-ranges of vertices/indices all packed into a single VB/IB.
With and without capture enabled, the amount reported is the same by our tool that accumulated the de-duped data:
Total device size: 865 MB
Total vertex size: 588 MB
Total texture size: 198 MB
Looks like this is still a problem with Spirv-Cross not placing interpolation modifiers onto vertex shader outputs. Can we get any clarification from Apple, since the docs are super confusing on these. It states that only the "fragment input" needs these modifiers, but the vertex output is also mostly the fragment input. I assume the vs output and ps input both need the same modifier, or the render pipeline fails to link.
https://github.com/KhronosGroup/SPIRV-Cross/issues/1542