Ray Tracing Intersection Result

My code run a Metal ray tracing intersector a few times per frame, on the same scene and with different sets of rays each time. I tried two approaches: 1) put the result to the same intersaction buffer, 2) put the result to separated buffers.


To my suprise, I see a big performance drop with approach 2. The performance slow down does not happen linearly with the number of intersector execution times, but a big drop when some criteria hit (such as use some other resources at the same time). On the other hand, with approach 1, everything runs fine.


Approach 1 is not always nature to my situation, as the following shaders have to deal with the reused intersection buffer in some counterintuitive way. What's the perfomance characters of intersectors? Why is this happening? What are the supposed best practice?

Replies

To be more clear, I made this comparison in an experiment that the number of result intersection buffers is the only difference. There is no simutenous use of multiple buffers afterwards.


In my case, the perfomance drops dramatically when the number of intersection buffers increase from 8 to 12 (meanwhile the perofmance is abou the same when the intersector runs 12 times with putting the result to the same buffer).

Educated guess, but I'd guess you're hitting a bandwidth limit there. Up to a point you have enough bandwidth for everything and you're compute or latency limited, then you don't and everything is limited by the bandwidth, which gets worse with each additional buffer.