Help finding full Metal2 API operations list...

I am trying to emulate double precision real math operations for HPC using Mac GPUs and Metal2, but I can't find any complete documentation on Metal2 32-bit instructions.

I'm interested in any operations that would aid double precision emulation - for example, are there separate 32-bit multiply hi and multiply low commands? Is there any mechanism for adding big numbers like an add with carry instruction? (Unfortunately, the C language lacks any such mechanisms.)

I would be thrilled to find a way to implement double precision with only about a 4x performance penalty, so I'm going for integer instructions rather than a "double double" approach. Any slower than that, and it's really only worth using AMX.

Any pointers to existing documentation would be greatly appreciated. Thanks for the help.

  • Jeff

I don't think the available documentation is this precise. You can try using clang builtins like __builtin_addc() etc. and see if it works, but frankly, I would be very surprised if Apple GPUs supported these operations. However, Apple Silicon does support 64-bit integers, maybe you can utilise them somehow? Still, double precisions at only 4x performance penalty might be too ambitious of a goal...

This repository: https://github.com/philipturner/metal-float64

The drop-downs in this comment: https://github.com/openmm/openmm/issues/3847#issuecomment-1317731445

You are correct that the best approach is 32-bit integer instructions, not double-single. Apple silicon has 64-bit integers, but it's so slow that 32-bit integers will be faster. If you are interested in helping me finish the metal-float64 library, that would be great.

While 4x performance penalty seems optimistic, I don't think it's physically possible. My theoretical calculations resulted in >27x for multiply and >>4x for addition, not including the time to process exponents.

Help finding full Metal2 API operations list...
 
 
Q