I have a swift project that operates on pixels (image generation) in parallel. Running the exact same code with the same settings (81M pixels) on my 2017 iMac and my M1Ultra (release builds, MacOS). The M1 Ultra takes nearly 2 min and the iMac takes 10.3 seconds. The main difference in profiler is calls to retain/release totaling 84 seconds on M1Ultra and 2 seconds on iMac i7. Like, how is this even possible? It makes this $4K Mac Studio a boat anchor as far as my need for it. Despite having 20 threads running vs 8, the MS is horrifically slow solely due to the retain/release molasses.
This is not a user app but a command line app. Instruments seems hopeless to tell me exactly what is even calling retain/release. I can see its ARM vs Intel code generation and otherwise the rest of the code seems equivalent.
If someone from Apple would comment I'd be willing to burn a support ticket (I used to work at DTS long ago). Otherwise I will post it to Medium to see if anyone can explain this.
This is the whole reason I spent so much on the MS, now it seems a waste of $.