sin() isn't going to help in any case here, as I'm trying to use the packed math features to increase ALU throughput and there's no equivalent to sin() for packed 16bit (i.e. it would just run sin() seperately on each value and not do two at once).
To disassemble AMD shaders, you need to get AMD's own shader decompiler from here:
https://github.com/CLRX/CLRX-mirror
After that, you need to actually run the shader (or perhaps building the pipeline state object is enough, not checked). It's highly advisable to do this in a small sample app that ONLY has that one pipeline state & shader, or you'll end up with a huge disassembly file. That compiles the shader, which gets helpfully cached. You can find the cache location with this command:
getconf DARWIN_USER_CACHE_DIR
Inside there you should find your app's cache, and the GPU you're interested in, and finally the 'functions.data' file which is the raw shader binary (and unfortunately other stuff). You can then disassemble it with:
clrxdisasm -g vega10 -r functions.data
(You'll need to replace 'vega10' with your GPU architecture, that'll work with a vega 56/64).
That gives you the assembly, plus some garbage around it (other data in the file gets interpreted as shader code too). I find it best to search for "s_endpgm" (end shader program), and work back. There will probably be 2+ shaders in there, hopefully you can spot some obvious code to know which you want.
Finally you'll want the relevant AMD ISA document, which is easily found on the web. The Vega ISA is here:
PDF Vega Instruction Set Architecture - AMD