Post

Replies

Boosts

Views

Activity

Reply to Clang 15.0 produces slow c++ applications
I checked with with option -### apple-macosx14.0.0 is invoqued is it right ? or shoud it be apple-macosx15.0.0 ? the -march=native is effective but it makes code slower . With the previous version -march=native made the code faster. Is the disk crypted by défault with Sonoma which could make code slower ? sorry I better put this in a reply than a comment. If you want to i can send you one of my examples a group of about 15 small files in a zip format.
Oct ’23
Reply to Clang 15.0 produces slow c++ applications
So -O3 makes code slower than -Ofast I tried -Wl,-ld_classic gives no difference I notice -march=native make code very much slower. I guess what changed with this last version is -march=native May be less support for Intel processor With the previous version of clang -march=native made code faster. I don't know about assembly. The fact is that my code and the compiler flags have not changed but the changes are in clang 15.0 and are not documented . Same codes run faster on Linux guests vmware and virtualbox. with gnu gcc or g++ And on Windows BootCamp with Mingw gcc g++ I precise my codes do not use graphical UI. It gives only results on the terminal.
Oct ’23
Reply to Clang 15.0 produces slow c++ applications
So -O3 makes code slower than -Ofast I tried -Wl,-ld_classic gives no difference I notice -march=native make code very much slower. I guess what changed with this last version is -march=native May be less support for Intel processor With the previous version of clang -march=native made code faster. I don't know about assembly. The fact is that my code and the compiler flags have not changed but the changes are in clang 15.0 and are not documented . Same codes run faster on Linux guests vmware and virtualbox. with gnu gcc or g++ And on Windows BootCamp with Mingw gcc g++ I precise my codes do not use graphical UI. It gives only results on the terminal.
Oct ’23
Reply to Clang 15.0 produces slow c++ applications
Hi , I don't see what you mean with clang -###. I won't try all command line options. I am not able to search in disassemblies . On Linux it is 15.0.7 clang version but I use gnu gcc . What is -ld_classic? I notice a global loss of run time performance i between Apple clang version 14.0.3 (clang-1403.0.22.14.1) and Apple clang version 15.0.0 (clang-1500.0.40.1) And Apple made it impossible to revert to 14.0.3 command line tools.
Oct ’23
Reply to Clang 15.0 produces slow c++ applications
Hi , Yes I compare things that are comparable . Clang 15.0 with the previous version using command line tools not Xcode. On the same macbook pro intel 2020. My previous toochain was Command_Line_Tools_for_Xcode_14.3.1_Release_Candidate. What I'm measuring is run time not compile time. My code has sense being fast , I do not bother about compile time. So eliminating -march=native I get a less worse performance . For now with different modifications I get 2 times slower than the previous clang. Some things have changed in this last version of clang compiler that is not documented. I confirm the same application in C or C++ works about 2 times faster on the linux guests vmware and virtualBox. And the same 2 times faster with mingw c++ on Windows Bootcamp. The only one which lacks performance is the last Apple clang 15.0 g++ or clang . With the previous toolchain I used : g++ -std=c++17 -Ofast -march=native -funroll-loops -lfto -DNDEBUG -o a prog.cpp With clang 15.0 to get less worse performance : g++ -std=c++17 -Ofast -funroll-loops -lfto -DNDEBUG -o a prog.cpp the perf : On linux guests about 11 seconds On previous clang about 10 seconds On Windows BootCamp about 11 seconds On last Apple clang 15.0 about 22 seconds.
Oct ’23
Reply to Clang 15.0 produces slow c++ applications
KT is std::vector < double > Mtype is std::vector < double > with different subscript T is double There no assembler instructions -03 makes better than -Ofast 3 times slower instead of 5 times slower. You pointed a right thing. It seems the -Ofast option does not work any more My opinion there is a security addon which blocks the code , something that blocks the memory access or that verify systematically array access or array bounds ? Or some new default options that makes code slower ? The codes of the two versions of my code in C and C++ has been tested with valgrind on Linux and gives no errors ( memory, and array bounds ).
Oct ’23
Reply to Clang 15.0 produces slow c++ applications
I'am afraid the algorithmic portion has nothing to do with the informations I gave to you. I can confirm in this sense. And more on my mac I have two virtual machines for simple use , vmware and virtualbox. The same code I mentionned earlier runs 12 seconds on vimware and 13 seconds on virtualbaox knowing that these two virtual machine use a lot less ressources than the host MacOs. I precise the regression appears with clang 15.0 not clang 13.x. . So 5 times slower for g++ clang 15.0 if you want a part of computational code here : template < typename T > int OAnacorr<T>::compute_afc_for_burt() { size_t n = 0, i = 0 ; T AKSI, ALAMBDA, PHI2, perc = 0 ; T CUMUL ; long TEST0, TEST1 ; T TOTA = M.sum() ; KT VL, VC ; VC.resize( M.getnc() ); VL.resize( M.getnl() ); Mtype M2( M.getnl(), M.getnc() ) ; KT PII ; M.peek_sum_rows( PII ) ; M.to_percent(); KT PJ ; M.peek_sum_cols( PJ ); Mtype K2( M ) ; KT KI( M.getnc() ) ; CUMUL = 0.0 ; PHI2 = K2.khi_deux_pond() ; if( Mtype::isnan(PHI2) || PHI2 <= 0 ) return 1; K2.peek_sum_cols( KI ) ; KT VVI( KI ) ; KT SVI( KI ) ; Mtype TH = M.get_theoric() ; /*****************************************/ KT VCSUP ; KT KSUP ; KT SKI ; KT A, D ; if( g_xsup > 0 ) { VCSUP.resize( TSC.getnl()); KSUP.resize( TSC.getnl() ); A.resize ( VCSUP.size() ); D.resize ( VCSUP.size() ) ; TSC.peek_sum_rows( KSUP ); Mtype STHEO ( TSC.getnl(), TSC.getnc() ) ; for( size_t i = 0 ; i < TSC.getnl(); i++ ) for( size_t j = 0; j < TSC.getnc(); j++ ) STHEO(i,j) = (PII[j] * KSUP[i]) / TOTA ; Mtype ECA = TSC - STHEO; Mtype SKI2 = ( ECA * ECA ) / STHEO ; SKI2 /= TOTA ; SKI2.peek_sum_rows( SKI ); for( size_t i = 0; i < VCSUP.size(); i++ ) D[i] = KSUP[i] / TOTA ; TSC.to_percent(); vect_to_percent( KSUP ); inth( TSC, KSUP, PJ, 1.0 ); } os << "\nAnalyse des correspondances (AFC)" << std::endl << std::endl ; os << "Phi Deux = " << std::setw(8) << std::setprecision(6) << std::fixed << PHI2 << std::endl; for ( n = 0 ; n < g_nbf ; n++ ) { if( ( 100.00 - CUMUL ) < 0.0000001 ) goto tend ; vect_zero( M, VL ) ; i=0 ; do { TEST0 = (VL[0] * 10000000.0) ; prod_by_cols( M, VC, VL ) ; AKSI = reduce_by_pond( PJ, VC) ; i++; if( i > 20000 ) { return 1 ; } prod_by_rows( M, VL, VC ) ; AKSI = reduce_by_pond( PJ, VL ) ; TEST1 = (VL[0] * 10000000.0) ; } while ( TEST1 != TEST0 ); if( g_xsup > 0 ) { prod_by_rows( TSC, VCSUP, VC ); T RX = reduce_by_pond( KSUP, VCSUP ); mul_vect( VCSUP, RX ); for ( size_t i = 0 ; i < VCSUP.size(); i++ ) if( Mtype::isnan(VCSUP[i])) VCSUP[i] = 0 ; WSUP.push_back( VCSUP ); } mul_vect( VC, AKSI ); WWC.push_back( VC ) ; ALAMBDA = ( n != 0 ) ? AKSI * AKSI : PHI2 ; rebuild_pond( M2, VC, PJ, AKSI ) ; M -= M2 ; if( n == 0 ) { WWC.push_back( PJ ); if( g_xsup > 0 ) { WSUP.push_back(VCSUP); } } if( n != 0 ) { mul_and_div( M2, TH ); M2.peek_sum_cols( KI ) ; SVI = KI ; div_vect( SVI, VVI ); WWC.push_back( SVI ); perc = (ALAMBDA / PHI2) * 100 ; CUMUL += perc ; g_nbvectors += 1 ; if( g_xsup > 0 ) { for( size_t i = 0 ; i < VCSUP.size(); i++ ) { A[i] = VCSUP[i] * VCSUP[i] * D[i] / SKI[i] ; // cos2 if ( Mtype::isnan(A[i])) A[i] = 0 ; if( A[i] >= 1 ) A[i] = 0.999; } WSUP.push_back(A); } std::ostringstream ox ; ox << "F" << n ; os << std::setw(5) << std::setfill(' ') << std::left << ox.str() << " Val Propre = " << std::setw(8) << std::setprecision(6) << std::fixed << ALAMBDA << " Pourcent= " << std::setw(5) << std::setprecision(2) << std::right << std::fixed << perc << " Cumulé= " << std::setw(6) << std::setprecision(2) << std::right << CUMUL << " Nb iter= " << std::setw(5) << std::right << ((n>0) ? i : i) << std::endl ; } div_vect( KI, ALAMBDA ); WWC.push_back( KI); if( g_xsup > 0 ) { for( size_t i = 0 ; i < VCSUP.size(); i++ ) { A[i] = VCSUP[i] * VCSUP[i] * D[i] / ALAMBDA ; // cpf if ( Mtype::isnan(A[i])) A[i] = 0 ; if( A[i] >= 1 ) A[i] = 0.999; } WSUP.push_back(A); } } tend: g_nbf = n ; os << std::endl; return 0; } djm44
Oct ’23
Reply to Clang 15.0 produces slow c++ applications
Hi, No I can't give all the codes of my applications. To give an idea, with and without Eigen library it is about factor analysis. I made C and C++ versions. The tests compute the factors of 64 questions of a survey . It gives a square matrix of 306 items . The method used to extract the factors is algorithmic. On MacBook pro Intel 2020 the total computation in C++ or C gives about 10 seconds with clang 13.x but 54 seconds with clang.15 . For C++ as I mentionned the complier option are -std=c++17 -Ofast -march=native -funroll-loops -flto -DNDEBUG For the C version idem gcc -Ofast -march=native -funroll-loops -flto -DBEBUG -o a file1.c file2.c cpp1.o file3.c -o a -lm -lstdc++ The computation time suffers a big regression. I do not see where it comes from. djm44
Oct ’23