I am trying to emulate double precision real math operations for HPC using Mac GPUs and Metal2, but I can't find any complete documentation on Metal2 32-bit instructions.
I'm interested in any operations that would aid double precision emulation - for example, are there separate 32-bit multiply hi and multiply low commands? Is there any mechanism for adding big numbers like an add with carry instruction? (Unfortunately, the C language lacks any such mechanisms.)
I would be thrilled to find a way to implement double precision with only about a 4x performance penalty, so I'm going for integer instructions rather than a "double double" approach. Any slower than that, and it's really only worth using AMX.
Any pointers to existing documentation would be greatly appreciated.
Thanks for the help.
Jeff