Storage of `ray_data` in ray tracing payload

This is a duplicate of my StackOverflow post linked here

I am currently working with Metal's ray tracing API. I remembered I could pass data from an intersection function to the compute kernel that started the ray intersection process. After rewatching the WWDC 2020 talk Discover ray tracing with Metal by Sean James (linked here), I found the relevant section around 16:13 where he talks about the ray payload.

However, I was curious where this payload is stored as it passed to the intersection function. When declared with the relevant [[ payload ]] attribute in the intersection function, it must be in the ray_data address. According to the Metal Shading Language Specification (version 2.3), pg. 64, the data passed into the intersection function is copied in the ray_data address space and is copied back out once the intersection function returns. However, this doesn't specify if, e.g., the data is stored in tile memory (like data in the threadgroup address space is) or stored in the per-thread memory (thread address space). The video did not specify this either.

In fact, the declarations for the intersect function (see pg. 204) that include the payload term are in the thread address space (which makes sense)

So where does the copied ray_data "version" of the data stored in the thread address space in the kernel go?
Answered by Graphics and Games Engineer in 674446022
The way the GPU stores the payload varies between device and there is no particular size. All we can really say is that cost scales roughly with the size so you should minimize that payload. If the payload gets too large you may run into a dramatic performance drop.
Question on your question …

Why would you care ?
In fact, if not public info, that could mean that this storage location could change in the future without notice.

So, one should be careful assuming anything about it.
Two reasons:
  1. Sheer curiosity

  2. I was worried that I could run out of tile memory if ray_data payloads were stored there. I was hoping to implement something like Rich Forster mentioned in the associated talk "Get to know Metal function pointers." Around 18:00 is the relevant section. There, he talks about divergence as a result of the different threads invoking different functions. The solution (described around 19:00) was to use threadgroup memory to pass around relevant data, which I thought could be constrained by the size of the payload

Of course, I should maybe figure that this would have been mentioned somewhere if it were something to consider and so I don't have to worry, but it's interesting nonetheless.

P.S. I haven't written the ray tracing kernel yet, nor the intersection/visible function(s). But it was something I considered as I was designing my program

Sheer curiosity

That's a pretty good reason.
Hope you'll get some answer.

have a good day.
Accepted Answer
The way the GPU stores the payload varies between device and there is no particular size. All we can really say is that cost scales roughly with the size so you should minimize that payload. If the payload gets too large you may run into a dramatic performance drop.
Storage of `ray_data` in ray tracing payload
 
 
Q