Metal texture allocated size versus actual image data size

Hello. In the iOS app i'm working on we are very tight on memory budget and I was looking at ways to reduce our texture memory usage. However I noticed that comparing ASTC8x8 to ASTC12x12, there is no actual difference in allocated memory for most of our textures despite ASTC12x12 having less than half the bpp of 8x8. The difference between the two only becomes apparent for textures 1024x1024 and larger, and even in that case the actual texture data is sometimes only 60% of the allocation size. I understand there must be some alignment and padding going on, but this seems extreme. For an example scene in my app with astc12x12 for most textures there is over a 100mb difference in astc size on disk versus when loaded, so I would love to be able to recover even a portion of that memory.

Here is some test code with some measurements i've taken using an iphone 11:

for(int i = 0; i < 11; i++) {
	MTLTextureDescriptor *texDesc = [[MTLTextureDescriptor alloc] init];
	texDesc.pixelFormat = MTLPixelFormatASTC_12x12_LDR;
	int dim = 12;
	int n = 2 << i;
	int mips = i+1;
	
	texDesc.width = n;
	texDesc.height = n;
	texDesc.mipmapLevelCount = mips;
	texDesc.resourceOptions = MTLResourceStorageModeShared;
	texDesc.usage = MTLTextureUsageShaderRead;
	
	// Calculate the equivalent astc texture size
	int blocks = 0;
	if(mips == 1) {
		blocks = n/dim + (n%dim>0? 1 : 0);
		blocks *= blocks;
	} else {
		for(int j = 0; j < mips; j++) {
			int a = 2 << j;
			int cur = a/dim + (a%dim>0? 1 : 0);
			blocks += cur*cur;
		}
	}
	
	auto tex = [objCObj newTextureWithDescriptor:texDesc];
	printf("%dx%d, mips %d, Astc: %d, Metal: %d\n", n, n, mips, blocks*16, (int)tex.allocatedSize);
}
MTLPixelFormatASTC_12x12_LDR
128x128, mips 7, Astc: 2768, Metal: 6016
256x256, mips 8, Astc: 10512, Metal: 32768
512x512, mips 9, Astc: 40096, Metal: 98304
1024x1024, mips 10, Astc: 158432, Metal: 262144

128x128, mips 1, Astc: 1936, Metal: 4096
256x256, mips 1, Astc: 7744, Metal: 16384
512x512, mips 1, Astc: 29584, Metal: 65536
1024x1024, mips 1, Astc: 118336, Metal: 147456

MTLPixelFormatASTC_8x8_LDR
128x128, mips 7, Astc: 5488, Metal: 6016
256x256, mips 8, Astc: 21872, Metal: 32768
512x512, mips 9, Astc: 87408, Metal: 98304
1024x1024, mips 10, Astc: 349552, Metal: 360448

128x128, mips 1, Astc: 4096, Metal: 4096
256x256, mips 1, Astc: 16384, Metal: 16384
512x512, mips 1, Astc: 65536, Metal: 65536
1024x1024, mips 1, Astc: 262144, Metal: 262144

I also tried using MTLHeaps (placement and automatic) hoping they might be better, but saw nearly the same numbers. Is there any way to have metal allocate these textures in a more compact way to save on memory?

There's a minimum page size (typically 32k to 64K) to all GPUs. So you can't really defeat that. That affects partially-resident textures, where you load chunks of mips. Also I wouldn't use anything above ASTC6x6 personally. Fitting 64 or 144 samples to two colors and some gradients is a way to turn textures to garbage.

This is also why you need to atlas, use array textures, etc. Then each of those can fill up multiple texture pages. Megatexture is just an extreme case of atlasing, where the uv and pages are dynamically adjusted to parts of a single texture.

Thanks for the reply. I know some amount of padding is necessary, but allocating 32kb for a 10kb 256x256 ASTC12x12 texture is somewhat brutal. Even ASTC6x6 has similar behavior: It allocates 96kb for a 40kb 256x256 ASTC6x6 texture. Only ASTC4x4 and 8x8 come close to using "most" of their allocations.

I got curious what its like on Android/Vulkan, so i wrote a similar test there and tried it on an Adreno 540 phone (from 2017). The results are much closer to what i was expecting:

VK_FORMAT_ASTC_12x12_UNORM_BLOCK
D/VK-SAMPLE: 128x128, mips 7, Align: 64, Astc: 2768, Size: 3584
D/VK-SAMPLE: 256x256, mips 8, Align: 64, Astc: 10512, Size: 12032
D/VK-SAMPLE: 512x512, mips 9, Align: 64, Astc: 40096, Size: 42304
D/VK-SAMPLE: 1024x1024, mips 10, Align: 64, Astc: 158432, Size: 163392

D/VK-SAMPLE: 128x128, mips 1, Align: 64, Astc: 1888, Size: 2304
D/VK-SAMPLE: 256x256, mips 1, Align: 64, Astc: 7520, Size: 9216
D/VK-SAMPLE: 512x512, mips 1, Align: 64, Astc: 29360, Size: 30976
D/VK-SAMPLE: 1024x1024, mips 1, Align: 64, Astc: 117424, Size: 123904

VK_FORMAT_ASTC_8x8_UNORM_BLOCK
D/VK-SAMPLE: 128x128, mips 7, Align: 64, Astc: 5488, Size: 5888
D/VK-SAMPLE: 256x256, mips 8, Align: 64, Astc: 21872, Size: 22272
D/VK-SAMPLE: 512x512, mips 9, Align: 64, Astc: 87408, Size: 87808
D/VK-SAMPLE: 1024x1024, mips 10, Align: 64, Astc: 349552, Size: 349952

D/VK-SAMPLE: 128x128, mips 1, Align: 64, Astc: 4096, Size: 4096
D/VK-SAMPLE: 256x256, mips 1, Align: 64, Astc: 16384, Size: 16384
D/VK-SAMPLE: 512x512, mips 1, Align: 64, Astc: 65536, Size: 65536
D/VK-SAMPLE: 1024x1024, mips 1, Align: 64, Astc: 262144, Size: 262144

Again, i'm not expecting to get exactly the same numbers or anything, but any reduction would be extremely helpful. I also didn't see much discussion on this issue online, so was hoping someone might have an explanation.

It's not really padding, it's page alignment. On console, you could sometimes pack buffer data into the unused bytes. Android has similar hardware and alignment requirements.

Sometimes mips have to be aligned to the page size, but maybe Metal team can relay more here. It's gpu specific, so I doubt they'll want to commit to details. Also hardware often has a packed mip tail in order to cut page use for smaller mip sizes.

Is your output showing the buffers that you generated to upload to Vulkan textures. These are linear block order texture data, and not the tiled order blocks used by the hw that may be aligned to texture pages. Some systems also have to pad mips out to a power-of-two size.

On desktop the tile size is 64KB. On iOS the tile size is 16KB. Here are tileSizes.

Format Desktop(64K) Mobile (16K)
ASTC/BC7 256x256 128x128
BC1/ETCr11 512x256 256x128

I included examples with only 1 mip and they show similar alignment/padding, so we can rule out mips as a sole reason.

In the vulkan example those numbers are the size of the allocation bound to each image. It should not be linear memory since it was created with VK_IMAGE_TILING_OPTIMAL. I really don't want to focus on this though. I only meant it as an example, that on a similar enough platform, it's even possible to have an image/texture whose allocated size is very close to the original data size.

I'm interested in any possible setting/flag etc that might help. If it's a bug (i'm doubtful) that would also be great, but if Apple says this is simply "the way it has to be" then that's a valid answer too.

It's exactly what I mentioned. Smaller images may have some page sub-allocation strategy. Small buffers in Metal are sub-allocated from a larger 128K buffer, but that's harder with texture data. Not that on macOS Intel, the page size is higher so the need to atlas is even more true there.

256x256, mips 8, Astc: 21872, Metal: 32768 <- mips add up to 16K x 4/3 approx = 21845, but due to align and 16K page size, padded out to 32K
256x256, mips 1, Astc: 16384, Metal: 16384  <- this is the exact fit 32x32x16B = 16K

While that is true of astc8x8, 12x12 clearly does not fit that pattern. This is the main issue.

256x256, mips 8, Astc: 10512, Metal: 32768    Why not 16k?
512x512, mips 9, Astc: 40096, Metal: 98304    Why not 48k or even 64k?
512x512, mips 1, Astc: 29584, Metal: 65536    Why not 32k?

I fail to see how the above numbers can be explained only by a 16kb page size.

Metal texture allocated size versus actual image data size
 
 
Q