TBDR Persistent Threadgroup Memory/Mid-Render Kernel

Question

lducot2 OP

Created Nov ’21

Replies 1

Boosts 0

Participants 2

Hello -

I am in the early phase of developing an algorithm and was hopeful someone could help me understand how threadgroup memory persists before I go too far down the wrong path.

For simplicity, let's say I am working with 32 KB of threadgroup memory, and I have two kernels K1 and K2.

In the first pass, each threadgroup in K1 loads 8129 32-bit values into threadgroup memory (using all 32 KB).

In the next pass, K2 access the threadgroup memory from K1 and performs some operation on the data.

Since threadgroup memory usually persists only during the lifetime of the threadgroup, in this mid-render kernel example, what can K2 access from the threadgroup memory in K1?

For example, say we have:

kernel void K1(threadgroup uint * mem_k1  [[ threadgroup(0) ]] );
kernel void K2(threadgroup uint * mem_k2 [[ threadgroup(0) ]] );

Say we launch both kernels with 10 threadgroups. Can K2 access every block of threadgroup memory initialized in K1? Or does [[ threadgroup(0) ]] refer only to 1 block of 32KB memory?

If we launch K1 and K2 with a different number of threadgroups per grid, does that change anything?

Or is [[threadgroup(0)]] completely dependent on what the host code allocates via the Metal API?

Thank you in advance.

Answered by Graphics and Games Engineer in 697298022

Tile shading doesn't give control of the number of threadgroups per grid. However, athreadgroup at a given position sees the contents of threadgroup memory as populated by the threadgroup of the previous kernel that was at the same position. This is like a depth-first execution of the kernels for a given position, as opposed to all the threadgroups of one kernel, followed by all the threadgroups of the next kernel, etc.

Boost

Answer 1

Graphics and Games Engineer OP

Apple

Dec ’21

Accepted Answer

Tile shading doesn't give control of the number of threadgroups per grid. However, athreadgroup at a given position sees the contents of threadgroup memory as populated by the threadgroup of the previous kernel that was at the same position. This is like a depth-first execution of the kernels for a given position, as opposed to all the threadgroups of one kernel, followed by all the threadgroups of the next kernel, etc.

0