Whoever made this comment, consider routing the link to MetalFFT to the MPS team. I would ideally like to get in contact with them over this through email - use the one posted on my GitHub profile. I've had a bad experience with Apple not responding to my stuff (see the comment below about a bug), so please leave a reply when you read this.
Post
Replies
Boosts
Views
Activity
Number: FB9797575
I just made a short message directing them to the comment this falls under. Is that good enough?
I tried looking up the word “shared memory” in that context and it kept showing M1’s sharing between CPU and GPU too. It’s called groupShared in DirectX, which only complicates things.
I have a feeling that my last post on this thread didn't get through to you because it wasn't a reply to a comment you made.
ARHeadsetKit already has all the capabilities of XROS, but it’s never going to be realized because nobody knows about it and Apple isn’t behind my effort.
Those don’t include AR headset experiences, and in my opinion one should learn content that prepares them for the future of AR headset technology. Even if ARHeadsetKit is never used, learning it gets you in the mindset of making 3D interfaces and accounting for stereoscopic rendering.
Also you might be interested in the tutorial series on ARHeadsetKit (it’s hyperlinked in the link I gave above). Thanks for your interest!
The neural engine can't be used for training. It uses only 16-bit half precision, not 16-bit bfloat16. That means gradients can't propagate through it for ML, but ANE can be used for inference. Only if Apple did what everybody else is doing and added BFloat16 acceleration or GPU matrix cores! Kudos to them for AMX on the CPU at least.
The Metal backend is still nowhere near done yet, but I recommend looking at Swift for TensorFlow's repositories (linked in my above reply) and Swift-Colab.
Would it be good if I linked the file in MetalFFT showing the profiling concerns? I mirrored the FFT sizes @CaptainHarlock told me and showed benchmarks concerning system-level cache thrashing. The cache bottleneck was one reason I gave up MetalFFT, but Apple might be better-suited to investigating it. This is about the GPU implementation, not vDSP, so I don't know if it's relevant.
Please disregard the replies above.
Please disregard this reply. The repository’s license has been changed to remove any restrictions possibly mentioned above.
I can test it for you. I have an Apple silicon and Intel Mac. But, I strongly recommend that you thoroughly read through my MetalFFT project first. In fact, if you could transfer over code from VkFFT to MetalFFT, you’ll complete the project. I don’t have much time to spend, but we could work out some plan where I test or translate code for you.
If you want the best performance, you need to make native Metal shaders, not a virtualized graphics technology like MoltenVK or SPIR-V. And learning from the hard lessons of MetalFFT, performance can surprise you.
Also, you can finish MetalFFT on your own if you can access an iPad and download Swift Playgrounds. That’s a benefit of MetalFFT being written entirely in Swift.
I see a strange phenomenon where certain posts on the developer forums don't show up, except when I view then under my login (e.g. one of the MetalFX comments). If the comment above was in fact censored, I apologize for addressing someone in such an unprofessional way and prompting that action. My main motivation: Apple is lagging behind other vendors for HPC, and I think we all want that to change.