In the WWDC talks on Metal that I have watched so far, many of the videos talk about Apple's A_ (fill in the blank, 11, 12, etc.) chip and the power it gives to the developer, such as allowing developers to leverage tile memory by opting to use TBDR. On macOS (at least Intel macs without the M1 chip), TBDR is unavailable, and other objects that leverage tile memory like image blocks are also unavailable. That made me wonder about the structure of the GPUs on macOS and external GPUs like the Blackmagic eGPU (which is currently hooked up to my computer). Are the concepts of tile memory ubiquitous across GPU architectures?
For example, if in a Metal kernel function we declared
Is this value stored in tile memory (threadgroup memory) on the Blackmagic? Or is there an equivalent storage that is dependent on hardware but available on all hardware in some form?
I know there are some WWDCs that deal with multiple GPUs which will probably be helpful, but extra information is always useful. Any links to information about GPU hardware architectures would be appreciated as well
For example, if in a Metal kernel function we declared
Code Block threadgroup float tgfloats[16];
Is this value stored in tile memory (threadgroup memory) on the Blackmagic? Or is there an equivalent storage that is dependent on hardware but available on all hardware in some form?
I know there are some WWDCs that deal with multiple GPUs which will probably be helpful, but extra information is always useful. Any links to information about GPU hardware architectures would be appreciated as well
On M1, tiles, which are used to store render target data during fragment shader executions, are use as threadgroup memory when a compute kernel executes. Although AMD and Intel GPUs do not have tile memory as they are Immediate mode renderers (IMR), the do have dedicated threadgroup memory caches for compute kernels. The characteristics of these caches, including bandwidth and size, differ.
M1 and the iOS GPU have some features which make using compute with rendering more efficient. This includes Tile shaders and image blocks.. These allow you mix compute kernels with rendering and utilize the on-chip tile memory to share data between shaders and compute kernels.
Although AMD and Intel GPUs do not have these features, their immediate mode rendering architectures make mixing separate render and compute passes less costly than on M1 and iOS GPUs and, in many cases, allow them overcome the advantages of using tile shader.
M1 and the iOS GPU have some features which make using compute with rendering more efficient. This includes Tile shaders and image blocks.. These allow you mix compute kernels with rendering and utilize the on-chip tile memory to share data between shaders and compute kernels.
Although AMD and Intel GPUs do not have these features, their immediate mode rendering architectures make mixing separate render and compute passes less costly than on M1 and iOS GPUs and, in many cases, allow them overcome the advantages of using tile shader.