Posts

Post not yet marked as solved
22 Replies
Tanks for these informations But on string manipulation with -march=native or -march=haswell I get a very poor performance compared with Linux guests vmWare or VirtualBox or Windows BootCamp. It's allmost 2 times slower I can send you the test. I puted it on the same number of FB as test2.zip FB13253046 or FB13256895
Post not yet marked as solved
22 Replies
Impossible to re-upload so a new FB FB13253046
Post not yet marked as solved
22 Replies
Hi , the FB number is FB13252912 . I sent a zip file wihich can run a test merely in C language The same code compiled with fastest options gives a better run time on Linux VirtualBox vmWare guests and Windows Boot Camp . You'll see difference by comparing clang 15 vs clang 14 on a MacBook Pro Intel 2020
Post not yet marked as solved
22 Replies
I checked with with option -### apple-macosx14.0.0 is invoqued is it right ? or shoud it be apple-macosx15.0.0 ? the -march=native is effective but it makes code slower . With the previous version -march=native made the code faster. Is the disk crypted by défault with Sonoma which could make code slower ? sorry I better put this in a reply than a comment. If you want to i can send you one of my examples a group of about 15 small files in a zip format.
Post not yet marked as solved
22 Replies
So -O3 makes code slower than -Ofast I tried -Wl,-ld_classic gives no difference I notice -march=native make code very much slower. I guess what changed with this last version is -march=native May be less support for Intel processor With the previous version of clang -march=native made code faster. I don't know about assembly. The fact is that my code and the compiler flags have not changed but the changes are in clang 15.0 and are not documented . Same codes run faster on Linux guests vmware and virtualbox. with gnu gcc or g++ And on Windows BootCamp with Mingw gcc g++ I precise my codes do not use graphical UI. It gives only results on the terminal.
Post not yet marked as solved
22 Replies
So -O3 makes code slower than -Ofast I tried -Wl,-ld_classic gives no difference I notice -march=native make code very much slower. I guess what changed with this last version is -march=native May be less support for Intel processor With the previous version of clang -march=native made code faster. I don't know about assembly. The fact is that my code and the compiler flags have not changed but the changes are in clang 15.0 and are not documented . Same codes run faster on Linux guests vmware and virtualbox. with gnu gcc or g++ And on Windows BootCamp with Mingw gcc g++ I precise my codes do not use graphical UI. It gives only results on the terminal.
Post not yet marked as solved
22 Replies
Hi , I don't see what you mean with clang -###. I won't try all command line options. I am not able to search in disassemblies . On Linux it is 15.0.7 clang version but I use gnu gcc . What is -ld_classic? I notice a global loss of run time performance i between Apple clang version 14.0.3 (clang-1403.0.22.14.1) and Apple clang version 15.0.0 (clang-1500.0.40.1) And Apple made it impossible to revert to 14.0.3 command line tools.
Post not yet marked as solved
22 Replies
Hi , Yes I compare things that are comparable . Clang 15.0 with the previous version using command line tools not Xcode. On the same macbook pro intel 2020. My previous toochain was Command_Line_Tools_for_Xcode_14.3.1_Release_Candidate. What I'm measuring is run time not compile time. My code has sense being fast , I do not bother about compile time. So eliminating -march=native I get a less worse performance . For now with different modifications I get 2 times slower than the previous clang. Some things have changed in this last version of clang compiler that is not documented. I confirm the same application in C or C++ works about 2 times faster on the linux guests vmware and virtualBox. And the same 2 times faster with mingw c++ on Windows Bootcamp. The only one which lacks performance is the last Apple clang 15.0 g++ or clang . With the previous toolchain I used : g++ -std=c++17 -Ofast -march=native -funroll-loops -lfto -DNDEBUG -o a prog.cpp With clang 15.0 to get less worse performance : g++ -std=c++17 -Ofast -funroll-loops -lfto -DNDEBUG -o a prog.cpp the perf : On linux guests about 11 seconds On previous clang about 10 seconds On Windows BootCamp about 11 seconds On last Apple clang 15.0 about 22 seconds.
Post not yet marked as solved
5 Replies
Hi, You say slow compile time for Xcode which also uses command-line-tools. But did you notice a slower execution or run time ? Well if you can test . I did.
Post not yet marked as solved
22 Replies
I am not used to use mailing list. I read the release notes of Apple clang 15.0 . It's very bulky. I did not notice any thing about the changes in the -Ofast command option. What are the specific flags added to -O3 in -Ofast ? For my test -Ofast was compatible with the computations.
Post not yet marked as solved
22 Replies
KT is std::vector < double > Mtype is std::vector < double > with different subscript T is double There no assembler instructions -03 makes better than -Ofast 3 times slower instead of 5 times slower. You pointed a right thing. It seems the -Ofast option does not work any more My opinion there is a security addon which blocks the code , something that blocks the memory access or that verify systematically array access or array bounds ? Or some new default options that makes code slower ? The codes of the two versions of my code in C and C++ has been tested with valgrind on Linux and gives no errors ( memory, and array bounds ).
Post not yet marked as solved
22 Replies
I'am afraid the algorithmic portion has nothing to do with the informations I gave to you. I can confirm in this sense. And more on my mac I have two virtual machines for simple use , vmware and virtualbox. The same code I mentionned earlier runs 12 seconds on vimware and 13 seconds on virtualbaox knowing that these two virtual machine use a lot less ressources than the host MacOs. I precise the regression appears with clang 15.0 not clang 13.x. . So 5 times slower for g++ clang 15.0 if you want a part of computational code here : template < typename T > int OAnacorr<T>::compute_afc_for_burt() { size_t n = 0, i = 0 ; T AKSI, ALAMBDA, PHI2, perc = 0 ; T CUMUL ; long TEST0, TEST1 ; T TOTA = M.sum() ; KT VL, VC ; VC.resize( M.getnc() ); VL.resize( M.getnl() ); Mtype M2( M.getnl(), M.getnc() ) ; KT PII ; M.peek_sum_rows( PII ) ; M.to_percent(); KT PJ ; M.peek_sum_cols( PJ ); Mtype K2( M ) ; KT KI( M.getnc() ) ; CUMUL = 0.0 ; PHI2 = K2.khi_deux_pond() ; if( Mtype::isnan(PHI2) || PHI2 <= 0 ) return 1; K2.peek_sum_cols( KI ) ; KT VVI( KI ) ; KT SVI( KI ) ; Mtype TH = M.get_theoric() ; /*****************************************/ KT VCSUP ; KT KSUP ; KT SKI ; KT A, D ; if( g_xsup > 0 ) { VCSUP.resize( TSC.getnl()); KSUP.resize( TSC.getnl() ); A.resize ( VCSUP.size() ); D.resize ( VCSUP.size() ) ; TSC.peek_sum_rows( KSUP ); Mtype STHEO ( TSC.getnl(), TSC.getnc() ) ; for( size_t i = 0 ; i < TSC.getnl(); i++ ) for( size_t j = 0; j < TSC.getnc(); j++ ) STHEO(i,j) = (PII[j] * KSUP[i]) / TOTA ; Mtype ECA = TSC - STHEO; Mtype SKI2 = ( ECA * ECA ) / STHEO ; SKI2 /= TOTA ; SKI2.peek_sum_rows( SKI ); for( size_t i = 0; i < VCSUP.size(); i++ ) D[i] = KSUP[i] / TOTA ; TSC.to_percent(); vect_to_percent( KSUP ); inth( TSC, KSUP, PJ, 1.0 ); } os << "\nAnalyse des correspondances (AFC)" << std::endl << std::endl ; os << "Phi Deux = " << std::setw(8) << std::setprecision(6) << std::fixed << PHI2 << std::endl; for ( n = 0 ; n < g_nbf ; n++ ) { if( ( 100.00 - CUMUL ) < 0.0000001 ) goto tend ; vect_zero( M, VL ) ; i=0 ; do { TEST0 = (VL[0] * 10000000.0) ; prod_by_cols( M, VC, VL ) ; AKSI = reduce_by_pond( PJ, VC) ; i++; if( i > 20000 ) { return 1 ; } prod_by_rows( M, VL, VC ) ; AKSI = reduce_by_pond( PJ, VL ) ; TEST1 = (VL[0] * 10000000.0) ; } while ( TEST1 != TEST0 ); if( g_xsup > 0 ) { prod_by_rows( TSC, VCSUP, VC ); T RX = reduce_by_pond( KSUP, VCSUP ); mul_vect( VCSUP, RX ); for ( size_t i = 0 ; i < VCSUP.size(); i++ ) if( Mtype::isnan(VCSUP[i])) VCSUP[i] = 0 ; WSUP.push_back( VCSUP ); } mul_vect( VC, AKSI ); WWC.push_back( VC ) ; ALAMBDA = ( n != 0 ) ? AKSI * AKSI : PHI2 ; rebuild_pond( M2, VC, PJ, AKSI ) ; M -= M2 ; if( n == 0 ) { WWC.push_back( PJ ); if( g_xsup > 0 ) { WSUP.push_back(VCSUP); } } if( n != 0 ) { mul_and_div( M2, TH ); M2.peek_sum_cols( KI ) ; SVI = KI ; div_vect( SVI, VVI ); WWC.push_back( SVI ); perc = (ALAMBDA / PHI2) * 100 ; CUMUL += perc ; g_nbvectors += 1 ; if( g_xsup > 0 ) { for( size_t i = 0 ; i < VCSUP.size(); i++ ) { A[i] = VCSUP[i] * VCSUP[i] * D[i] / SKI[i] ; // cos2 if ( Mtype::isnan(A[i])) A[i] = 0 ; if( A[i] >= 1 ) A[i] = 0.999; } WSUP.push_back(A); } std::ostringstream ox ; ox << "F" << n ; os << std::setw(5) << std::setfill(' ') << std::left << ox.str() << " Val Propre = " << std::setw(8) << std::setprecision(6) << std::fixed << ALAMBDA << " Pourcent= " << std::setw(5) << std::setprecision(2) << std::right << std::fixed << perc << " Cumulé= " << std::setw(6) << std::setprecision(2) << std::right << CUMUL << " Nb iter= " << std::setw(5) << std::right << ((n>0) ? i : i) << std::endl ; } div_vect( KI, ALAMBDA ); WWC.push_back( KI); if( g_xsup > 0 ) { for( size_t i = 0 ; i < VCSUP.size(); i++ ) { A[i] = VCSUP[i] * VCSUP[i] * D[i] / ALAMBDA ; // cpf if ( Mtype::isnan(A[i])) A[i] = 0 ; if( A[i] >= 1 ) A[i] = 0.999; } WSUP.push_back(A); } } tend: g_nbf = n ; os << std::endl; return 0; } djm44
Post not yet marked as solved
22 Replies
Hi, No I can't give all the codes of my applications. To give an idea, with and without Eigen library it is about factor analysis. I made C and C++ versions. The tests compute the factors of 64 questions of a survey . It gives a square matrix of 306 items . The method used to extract the factors is algorithmic. On MacBook pro Intel 2020 the total computation in C++ or C gives about 10 seconds with clang 13.x but 54 seconds with clang.15 . For C++ as I mentionned the complier option are -std=c++17 -Ofast -march=native -funroll-loops -flto -DNDEBUG For the C version idem gcc -Ofast -march=native -funroll-loops -flto -DBEBUG -o a file1.c file2.c cpp1.o file3.c -o a -lm -lstdc++ The computation time suffers a big regression. I do not see where it comes from. djm44