Hi,
Just today I was compiling the code by swiftc -O test.swift.
The swiftc version is 5.4 from the most recent 12.5 Xcode distributed along with macOS Big Sur 11.3.
Now I only get 119 GFlops. In 11.2.x I got 4500 GFlops with exactly the same code on my Mac Pro 2019 with AND Radeon Vega II. Does anybody else observes the same behavior?
Best regards
Post
Replies
Boosts
Views
Activity
Hi,
I found out that the performance drop in Big Sur 11.3 is due to a poor transfer performance of the data to the GPU.
You can explore this using the attached code by compiling and running the attached code 2 times:
First:
Compile as it is: swicftC -O matrixMul.swift and run it by executing: ./matrixMul
On my MacPro running macOS Big Sur 11.3.1 I get the following output:
Values in matrix A[8192 x 8192]: 1.0 uniformly
Values in matrix B[8192 x 8192]: 2.0 uniformly
Starting calculation on AMD Radeon Pro Vega II
...
Values in matrix C = A * B: 16384.0 uniformly
1'099'444'518'912 floating point operations performed
Elapsed GPU time = 1.92 seconds - 0.573 Teraflops
Second:
Comment out line 74 and line 75, and instead uncomment line 86 and line 87.
This shifts the starting point of the time measurement from the beginning of the encoding procedure to the beginning of the commit statement, i.e. the elapsed time reported reflects the time spent for the calculation on the GPU.
Compile again: swicftC -O matrixMul.swift and run: ./matrixMul
This time I get
Values in matrix A[8192 x 8192]: 1.0 uniformly
Values in matrix B[8192 x 8192]: 2.0 uniformly
Starting calculation on AMD Radeon Pro Vega II
...
Values in matrix C = A * B: 16384.0 uniformly
1'099'444'518'912 floating point operations performed
Elapsed GPU time = 0.164 seconds - 6.704 Teraflops
As can be seen, the time needed for the encoding / transfer to the GPU is dominant.
This was not the case in macOS versions prior to 11.3!
I got 0.25 Seconds reported for the First procedure on average, i.e. the time for encoding / transfer was much shorter!
It seems that the data transfer / encoding in the latest macOS version is now far less efficient compared to previous versions of macOS. Maybe the underlaying framework is now more optimized for the M1 chips with some drawbacks for the MacPro 2019 architecture?
Hope you can reproduce this as well!
Thank you
matrixMul.swift - https://developer.apple.com/forums/content/attachment/67c81476-a4a3-479e-9fcf-98c626759552
Thanks for investigating this!