My code run a Metal ray tracing intersector a few times per frame, on the same scene and with different sets of rays each time. I tried two approaches: 1) put the result to the same intersaction buffer, 2) put the result to separated buffers.
To my suprise, I see a big performance drop with approach 2. The performance slow down does not happen linearly with the number of intersector execution times, but a big drop when some criteria hit (such as use some other resources at the same time). On the other hand, with approach 1, everything runs fine.
Approach 1 is not always nature to my situation, as the following shaders have to deal with the reused intersection buffer in some counterintuitive way. What's the perfomance characters of intersectors? Why is this happening? What are the supposed best practice?