References on building on-GPU data structures?

I have a number of interesting algorithms I am trying to experiment with on the GPU that have been very successful for me for various use cases on CPU graphics code. However, most of these require various data structures that present implementation challenges when considering a GPU port. Some examples are octrees, Hilbert R-trees, and even queues. It is trivial to produce these directly on the CPU and represent them in MTLBuffers, but it seems much more challenging to construct these directly on the GPU in a compute kernel in a parallel streaming fashion.


I can envision a few approaches based on multiple passes (sort of a GPU based map-reduce). And maybe taking advantage of threadgroup memory to partition problems for later merges.


Clearly, Apple has done some pretty good work for their Metal Performance Shader Ray Tracing funcitonality (probably a kd tree) but I have no idea how that data structure construction is divided between the CPU and GPU. Also, not sure if Apple's SceneKit physics acceleration structures are GPU-based.


I figure there must be some great literature out there on this. Any pointers would be helpful. Extra points if they focus on Metal. ;-)

Replies

I guess SceneKit does physics with a lot of help from Metal, as well as SIMD on CPU.