Clang 15.0 produces slow c++ applications

Hello,

I Run MacOs ventura 13.6 and command line tools 15.0 on MacBook Intel I7 post 2018.

After installing clang 15.0 the performance of C++ test programs shows 4 at 5 times slower execution time compared to Clang 13.0

Has anybody observed this slow down ?

The tests using a lot of mathematical computations is compiled with the folowing command :

g++ -std=c++17 -march=native -funroll-loops -Ofast -DNDEBUG -o a atest.cpp

So I had to revert to Clang 13.0 to have reasonnable execution time .

What makes C++ code so slow ?

  • diiscard

Add a Comment

Replies

So -O3 makes code slower than -Ofast

I tried -Wl,-ld_classic gives no difference

I notice -march=native make code very much slower.

I guess what changed with this last version is -march=native

May be less support for Intel processor

With the previous version of clang -march=native made code faster.

I don't know about assembly.

The fact is that my code and the compiler flags have not changed but the changes are in clang 15.0 and are not documented .

Same codes run faster on Linux guests vmware and virtualbox. with gnu gcc or g++ And on Windows BootCamp with Mingw gcc g++

I precise my codes do not use graphical UI. It gives only results on the terminal.

  • -march=native seems to be critical with Intel

Add a Comment

I checked with with option -###

apple-macosx14.0.0 is invoqued is it right ? or shoud it be apple-macosx15.0.0 ?

the -march=native is effective but it makes code slower . With the previous version -march=native made the code faster.

Is the disk crypted by défault with Sonoma which could make code slower ?

sorry I better put this in a reply than a comment.

If you want to i can send you one of my examples a group of about 15 small files in a zip format.

If you want to i can send you one of my examples a group of about 15 small files in a zip format

Yes, it'll be much easier if we can reproduce. I'd suggest filing a Feedback, attach the example with instructions, and post the FB number here.

apple-macosx14.0.0 is invoqued is it right ? or shoud it be apple-macosx15.0.0 ?

This is fine.

the -march=native is effective but it makes code slower . With the previous version -march=native made the code faster.

It really depends on which optimizations are chosen and data that's fed into it. In general, yes, -march=native should produce better code. However, there can be overhead depending on data layout, size, and other factors.

Is the disk crypted by défault with Sonoma which could make code slower ?

File system encryption is unlikely to be the issue here. Consider that both good and bad cases are running in the same environment.

Hi ,

the FB number is FB13252912 . I sent a zip file wihich can run a test merely in C language The same code compiled with fastest options gives a better run time on Linux VirtualBox vmWare guests and Windows Boot Camp .

You'll see difference by comparing clang 15 vs clang 14 on a MacBook Pro Intel 2020

  • Thanks for filing the Feedback! It looks like the attached archive is 29-bytes. Can you try re-uploading?

  • The right FB is FB13253046

  • Thank you! We're able to reproduce. You might give -march=haswell a try.

Impossible to re-upload so a new FB
FB13253046

Tanks for these informations

But on string manipulation with -march=native or -march=haswell I get a very poor performance compared with Linux guests vmWare or VirtualBox or Windows BootCamp. It's allmost 2 times slower

I can send you the test.

I puted it on the same number of FB as test2.zip

FB13253046

or

FB13256895

  • Well it's surprising that the Linux guests virtual machine codes ( and windows Bootcamp ) run that much faster.

  • Thanks again for the examples!

    I've only looked at macOS, not the other platforms. However, I'll mention that these string benchmarks are mostly dominated by memory management, not the string ops. The test strings are small and won't benefit from loop optimizations like vectorization, unrolling, aligning to caches. If this is a core functionality in your app, you'll see much greater performance wins with better tuned data structures and algorithms.

  • Well if we get rid of my code and focus on the std::string from the standard library the gap in performance stays significant between the guests virtual machine and the Mac host. I find it surprising and probably nothing can be done to reduce this gap.

-march=native is a pretty bad compile setting. You really aren't being specific then about the architecture. And for simd-based libraries you need to enable that too (-mavx2, -msee4.2, etc). I was using that arch that on Windows, and then the compiler uses whatever the SIMD architecture is of the machine that you compile upon. So I'd get AVX-512 code that would then crash on newer machines that omit that support. If you're running the Intel code emulated on Rosetta2, then you can expect a 2x slowdown there.

Also note for Intel apps to run under Rosetta2, had to disable avx, and drop to SSE4.2. Also had to drop f16 support since neither of these are supported.