16" MBP w/AMD doesn't support MTLCounterSamplingPointAtStageBoundary

This is the latest Intel Mac running with AMD 5500, and it can't sample timings at stage boundaries? How are we supposed to write timing consistently for macOS and iOS if that's not the case? So I have to then add several 1000 samples per draw call and accumulate them? I don't remember the docs or sample code pointing this out.

Our app compiles to deploy on macOS 10.15. Does setting that higher help with this?

MTLCounterSamplingPointAtStageBoundary is not supported, startOfVertexSampleIndex must be MTLCounterDontSample.

MTLCounterSamplingPointAtStageBoundary is not supported, startOfFragmentSampleIndex must be MTLCounterDontSample

Could stage boundary and depthClamp support (which docs erroneously list as family v4_1 instead of v2_4 in the docs) be put onto the Metal Feature Set docs, so I don't waste time writing workarounds for missing functionality.

Hi AlecazamTGC,

It is unfortunately not possible to sample AMD GPUs at stage boundaries. This comes down to differences in the hardware. The MTLDevice protocol offers a device query you can use to determine what sampling points are supported by the GPU you're using. Please check out https://developer.apple.com/documentation/metal/mtldevice/3564459-supportscountersampling

For your use case, you could take a sample at the beginning of your pass (before your first draw call) and another one at the end (after your last draw call), however, keep in mind the rendering models between AMD and Apple GPUs are significantly different so timings might not be directly comparable.

Thank you for your feedback regarding the Feature Set tables. I have relied it to the corresponding team to investigate.

Yes, I ended up using the draw stage boundaries around all of our renderPasses, and on iOS I use the stage boundary calls. I thought I was going to have to set draw boundary data on each draw call, but the draw stage boundaries were really just a timestamp to inject into the command stream. The WWDC video was helpful.

MTLParallelRenderCommandEncoders weren't supported, but was able to define timers around the sub-encoders on those. It was a ton of code and tricky to support both macOS and iOS, and I had to deal with 3 different encoders, and adjusting the timestamps on macOS Intel.

It's done now and working at least for macOS 11+ and iOS 14+. Also should solve M1 timings.

16" MBP w/AMD doesn't support MTLCounterSamplingPointAtStageBoundary
 
 
Q