Diagnosing poor build performance

I have a 28-core Mac Pro which takes twice as long to build my project as a 16" macbook pro. I want to understand why.


I'm building this project: https://github.com/gnachman/iTerm2


The build configuration is Debug. The optimization level is set to None. There is no Swift: just C, Objective C, and interface builder XIBs. I'm using Xcode 11.3 on both machines on macOS 10.15.1.


Build time summary for Mac Pro:

CompileC (728 tasks) | 1673.157 seconds
CompileXIB (58 tasks) | 649.565 seconds
TiffUtil (121 tasks) | 36.980 seconds
ProcessPCH (5 tasks) | 16.840 seconds
CopyPNGFile (26 tasks) | 12.200 seconds
StripNIB (1 task) | 9.133 seconds
CompileMetalFile (21 tasks) | 6.290 seconds
CompileAssetCatalog (1 task) | 1.688 seconds
Ld (2 tasks) | 1.528 seconds
Libtool (3 tasks) | 1.098 seconds
CodeSign (7 tasks) | 0.768 seconds
PhaseScriptExecution (1 task) | 0.248 seconds
MetalLink (2 tasks) | 0.222 seconds
DataModelCompile (1 task) | 0.171 seconds
ProcessPCH++ (2 tasks) | 0.053 seconds
Touch (1 task) | 0.003 seconds

Build time summary for Macbook Pro:

CompileC (728 tasks) | 290.118 seconds 
CompileXIB (58 tasks) | 12.626 seconds
ProcessPCH (5 tasks) | 9.958 seconds
TiffUtil (121 tasks) | 5.955 seconds
CompileMetalFile (21 tasks) | 5.135 seconds
CompileAssetCatalog (1 task) | 1.283 seconds
Ld (2 tasks) | 1.143 seconds
CopyPNGFile (26 tasks) | 1.102 seconds
CodeSign (7 tasks) | 0.734 seconds
Libtool (3 tasks) | 0.332 seconds
PhaseScriptExecution (1 task) | 0.197 seconds
DataModelCompile (1 task) | 0.196 seconds
MetalLink (2 tasks) | 0.083 seconds
ProcessPCH++ (2 tasks) | 0.046 seconds
StripNIB (1 task) | 0.012 seconds
Touch (1 task) | 0.001 seconds

Observations:

* Clock time is 52.2 seconds for the Mac Pro vs 27.6 seconds on the Macbook Pro.

* CPU utilization on the Mac Pro is abysmal. It stays around 30%. The macbook pro stays at 100% while building.

* No throttling, per Intel Power Gadget. The average clock rate stays at 3.4ghz while building.

* CPU temperature never exceeds 60 degrees C.

* I get the same results when source and build folders are in a ramdisk on the Mac Pro.

* I/O doesn't seem to be a problem, as shown by iostat: https://pastebin.com/avjaQ3ZE

* I have tried reducing parallelism by setting the IDEBuildOperationMaxNumberOfConcurrentCompileTasks user default, but it doesn't have any effect.

* I made a script that injected timestamps into the command-line xcodebuild, which can be seen here: https://pastebin.com/yWdN27S8


My best theory is that so enough of the work is not parallelizable that the lower base clock rate is the problem, but I'm surprised how big the effect is considering there are over 600 translation units.


I would like to learn more about how to diagnose such an issue, since I'm pretty much out of ideas of what to try next.

Answered by john daniel in 403175022

Perhaps you are RAM-constrained rather than CPU bound. Your 6-core build would use your RAM/6. Your 28-core build would use your RAM/28. And since all of these are likely hyperthreaded, that is RAM/56. Therefore, if you have 32 GB ram on the MBP, you get about 2.3 GB per core. But if you have 64 GB on the MP, you only get 1.07 GB RAM per core.


Try using "cpuctl" to turn off some of your CPUs. Then do a 6-core to 6-core compare. Then turn on CPUs to give each one the same amount of RAM as the MBP. If all of that seems to make sense, get more RAM for the MP.

In my mind a ram disk would potentially be self defeating in that it takes memory that could be used more directly for processing and uses it through code simulating a disk drive with the overhead of the file system. There should already be caching of disk content to ram in what is likely a more efficient manner.


I would expect allowing more parallel operations would, up to a point, increase the overall speed of compiling.


Utimately the speed matters only to the extent it matters for you. If you are having to sit and wait the extra time then that can matter since time is money after all.

Accepted Answer

Perhaps you are RAM-constrained rather than CPU bound. Your 6-core build would use your RAM/6. Your 28-core build would use your RAM/28. And since all of these are likely hyperthreaded, that is RAM/56. Therefore, if you have 32 GB ram on the MBP, you get about 2.3 GB per core. But if you have 64 GB on the MP, you only get 1.07 GB RAM per core.


Try using "cpuctl" to turn off some of your CPUs. Then do a 6-core to 6-core compare. Then turn on CPUs to give each one the same amount of RAM as the MBP. If all of that seems to make sense, get more RAM for the MP.

Great idea, that's exactly what I needed. Looks like enabling cores 0-21 gives the best performance, which is approximately the same as the laptop.


I agree with the theory that I’m RAM bound, since there really isn’t anything else left to contend over. I only have DIMMs in half the slots. I wonder if I could double my memory bandwidth by installing six more?

Diagnosing poor build performance
 
 
Q