Tanks for these informations
But on string manipulation with -march=native or -march=haswell I get a very poor performance
compared with Linux guests vmWare or VirtualBox or Windows BootCamp. It's allmost 2 times slower
I can send you the test.
I puted it on the same number of FB as test2.zip
FB13253046
or
FB13256895
Post
Replies
Boosts
Views
Activity
Impossible to re-upload so a new FB
FB13253046
Hi ,
the FB number is FB13252912 .
I sent a zip file wihich can run a test merely in C language The same code compiled with fastest options gives
a better run time on Linux VirtualBox vmWare guests and Windows Boot Camp .
You'll see difference by comparing clang 15 vs clang 14 on a MacBook Pro Intel 2020
I checked with with option -###
apple-macosx14.0.0 is invoqued is it right ? or shoud it be apple-macosx15.0.0 ?
the -march=native is effective but it makes code slower . With the previous version -march=native made the code faster.
Is the disk crypted by défault with Sonoma which could make code slower ?
sorry I better put this in a reply than a comment.
If you want to i can send you one of my examples a group of about 15 small files in a zip format.
diiscard
So -O3 makes code slower than -Ofast
I tried -Wl,-ld_classic gives no difference
I notice -march=native make code very much slower.
I guess what changed with this last version is -march=native
May be less support for Intel processor
With the previous version of clang -march=native made code faster.
I don't know about assembly.
The fact is that my code and the compiler flags have not changed but the changes are in clang 15.0 and are not documented .
Same codes run faster on Linux guests vmware and virtualbox. with gnu gcc or g++ And on Windows BootCamp with Mingw gcc g++
I precise my codes do not use graphical UI. It gives only results on the terminal.
So -O3 makes code slower than -Ofast
I tried -Wl,-ld_classic gives no difference
I notice -march=native make code very much slower.
I guess what changed with this last version is -march=native
May be less support for Intel processor
With the previous version of clang -march=native made code faster.
I don't know about assembly.
The fact is that my code and the compiler flags have not changed
but the changes are in clang 15.0 and are not documented .
Same codes run faster on Linux guests vmware and virtualbox. with gnu gcc or g++
And on Windows BootCamp with Mingw gcc g++
I precise my codes do not use graphical UI. It gives only results on the terminal.
Hi ,
I don't see what you mean with clang -###.
I won't try all command line options.
I am not able to search in disassemblies
.
On Linux it is 15.0.7 clang version but I use gnu gcc
.
What is -ld_classic?
I notice a global loss of run time performance i between
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
and
Apple clang version 15.0.0 (clang-1500.0.40.1)
And Apple made it impossible to revert to 14.0.3 command line tools.
Hi ,
Yes I compare things that are comparable . Clang 15.0 with the previous version using command line tools not Xcode. On the same macbook pro intel 2020.
My previous toochain was Command_Line_Tools_for_Xcode_14.3.1_Release_Candidate.
What I'm measuring is run time not compile time. My code has sense being fast , I do not bother about compile time.
So eliminating -march=native I get a less worse performance .
For now with different modifications I get 2 times slower than the previous clang.
Some things have changed in this last version of clang compiler that is not documented. I confirm the same application in C or C++ works about 2 times faster on the linux
guests vmware and virtualBox. And the same 2 times faster with mingw c++ on Windows Bootcamp.
The only one which lacks performance is the last Apple clang 15.0 g++ or clang .
With the previous toolchain I used :
g++ -std=c++17 -Ofast -march=native -funroll-loops -lfto -DNDEBUG -o a prog.cpp
With clang 15.0 to get less worse performance :
g++ -std=c++17 -Ofast -funroll-loops -lfto -DNDEBUG -o a prog.cpp
the perf :
On linux guests about 11 seconds
On previous clang about 10 seconds
On Windows BootCamp about 11 seconds
On last Apple clang 15.0 about 22 seconds.
Hi,
You say slow compile time for Xcode which also uses command-line-tools. But did you notice a slower execution or run time ? Well if you can test .
I did.
I am not used to use mailing list. I read the release notes of Apple clang 15.0 . It's very bulky. I did not notice any
thing about the changes in the -Ofast command option.
What are the specific flags added to -O3 in -Ofast ?
For my test -Ofast was compatible with the computations.
KT is std::vector < double >
Mtype is std::vector < double > with different subscript
T is double
There no assembler instructions
-03 makes better than -Ofast 3 times slower instead of 5 times slower. You pointed a right thing.
It seems the -Ofast option does not work any more My opinion there is a security addon which blocks the code , something that blocks the memory access or that
verify systematically array access or array bounds ? Or some new default options that makes code slower ?
The codes of the two versions of my code in C and C++ has been tested with valgrind on Linux and gives no errors ( memory, and array bounds ).
I'am afraid the algorithmic portion has nothing to do with the informations I gave to you. I can confirm in this sense. And more on my mac I have two virtual machines for simple use , vmware and virtualbox. The same code I mentionned earlier runs 12 seconds on vimware and 13 seconds on virtualbaox knowing that these two virtual machine use a lot less ressources than the host MacOs. I precise the regression appears with clang 15.0 not clang 13.x. . So 5 times slower for g++ clang 15.0
if you want a part of computational code here :
template < typename T >
int OAnacorr<T>::compute_afc_for_burt()
{
size_t n = 0, i = 0 ;
T AKSI, ALAMBDA, PHI2, perc = 0 ;
T CUMUL ;
long TEST0, TEST1 ;
T TOTA = M.sum() ;
KT VL, VC ;
VC.resize( M.getnc() );
VL.resize( M.getnl() );
Mtype M2( M.getnl(), M.getnc() ) ;
KT PII ;
M.peek_sum_rows( PII ) ;
M.to_percent();
KT PJ ;
M.peek_sum_cols( PJ );
Mtype K2( M ) ;
KT KI( M.getnc() ) ;
CUMUL = 0.0 ;
PHI2 = K2.khi_deux_pond() ;
if( Mtype::isnan(PHI2) || PHI2 <= 0 )
return 1;
K2.peek_sum_cols( KI ) ;
KT VVI( KI ) ;
KT SVI( KI ) ;
Mtype TH = M.get_theoric() ;
/*****************************************/
KT VCSUP ;
KT KSUP ;
KT SKI ;
KT A, D ;
if( g_xsup > 0 )
{
VCSUP.resize( TSC.getnl());
KSUP.resize( TSC.getnl() );
A.resize ( VCSUP.size() );
D.resize ( VCSUP.size() ) ;
TSC.peek_sum_rows( KSUP );
Mtype STHEO ( TSC.getnl(), TSC.getnc() ) ;
for( size_t i = 0 ; i < TSC.getnl(); i++ )
for( size_t j = 0; j < TSC.getnc(); j++ )
STHEO(i,j) = (PII[j] * KSUP[i]) / TOTA ;
Mtype ECA = TSC - STHEO;
Mtype SKI2 = ( ECA * ECA ) / STHEO ;
SKI2 /= TOTA ;
SKI2.peek_sum_rows( SKI );
for( size_t i = 0; i < VCSUP.size(); i++ )
D[i] = KSUP[i] / TOTA ;
TSC.to_percent();
vect_to_percent( KSUP );
inth( TSC, KSUP, PJ, 1.0 );
}
os << "\nAnalyse des correspondances (AFC)" << std::endl << std::endl ;
os << "Phi Deux = " << std::setw(8) << std::setprecision(6) << std::fixed << PHI2 << std::endl;
for ( n = 0 ; n < g_nbf ; n++ )
{
if( ( 100.00 - CUMUL ) < 0.0000001 ) goto tend ;
vect_zero( M, VL ) ;
i=0 ;
do
{
TEST0 = (VL[0] * 10000000.0) ;
prod_by_cols( M, VC, VL ) ;
AKSI = reduce_by_pond( PJ, VC) ;
i++;
if( i > 20000 )
{
return 1 ;
}
prod_by_rows( M, VL, VC ) ;
AKSI = reduce_by_pond( PJ, VL ) ;
TEST1 = (VL[0] * 10000000.0) ;
}
while ( TEST1 != TEST0 );
if( g_xsup > 0 )
{
prod_by_rows( TSC, VCSUP, VC );
T RX = reduce_by_pond( KSUP, VCSUP );
mul_vect( VCSUP, RX );
for ( size_t i = 0 ; i < VCSUP.size(); i++ )
if( Mtype::isnan(VCSUP[i])) VCSUP[i] = 0 ;
WSUP.push_back( VCSUP );
}
mul_vect( VC, AKSI );
WWC.push_back( VC ) ;
ALAMBDA = ( n != 0 ) ? AKSI * AKSI : PHI2 ;
rebuild_pond( M2, VC, PJ, AKSI ) ;
M -= M2 ;
if( n == 0 )
{
WWC.push_back( PJ );
if( g_xsup > 0 )
{
WSUP.push_back(VCSUP);
}
}
if( n != 0 )
{
mul_and_div( M2, TH );
M2.peek_sum_cols( KI ) ;
SVI = KI ;
div_vect( SVI, VVI );
WWC.push_back( SVI );
perc = (ALAMBDA / PHI2) * 100 ;
CUMUL += perc ;
g_nbvectors += 1 ;
if( g_xsup > 0 )
{
for( size_t i = 0 ; i < VCSUP.size(); i++ )
{
A[i] = VCSUP[i] * VCSUP[i] * D[i] / SKI[i] ; // cos2
if ( Mtype::isnan(A[i])) A[i] = 0 ;
if( A[i] >= 1 ) A[i] = 0.999;
}
WSUP.push_back(A);
}
std::ostringstream ox ;
ox << "F" << n ;
os << std::setw(5) << std::setfill(' ') << std::left << ox.str()
<< " Val Propre = "
<< std::setw(8) << std::setprecision(6) << std::fixed << ALAMBDA
<< " Pourcent= " << std::setw(5) << std::setprecision(2) << std::right << std::fixed << perc
<< " Cumulé= " << std::setw(6) << std::setprecision(2) << std::right << CUMUL
<< " Nb iter= "
<< std::setw(5) << std::right << ((n>0) ? i : i) << std::endl ;
}
div_vect( KI, ALAMBDA );
WWC.push_back( KI);
if( g_xsup > 0 )
{
for( size_t i = 0 ; i < VCSUP.size(); i++ )
{
A[i] = VCSUP[i] * VCSUP[i] * D[i] / ALAMBDA ; // cpf
if ( Mtype::isnan(A[i])) A[i] = 0 ;
if( A[i] >= 1 ) A[i] = 0.999;
}
WSUP.push_back(A);
}
}
tend:
g_nbf = n ;
os << std::endl;
return 0;
}
djm44
Hi,
No I can't give all the codes of my applications. To give an idea, with and without Eigen library it is about factor analysis. I made C and C++ versions. The tests compute the factors of 64 questions of a survey . It gives a square matrix of 306 items . The method used to extract the factors is algorithmic.
On MacBook pro Intel 2020 the total computation in C++ or C gives about 10 seconds with clang 13.x but 54 seconds with clang.15 .
For C++ as I mentionned the complier option are -std=c++17 -Ofast -march=native -funroll-loops -flto -DNDEBUG
For the C version idem
gcc -Ofast -march=native -funroll-loops -flto -DBEBUG -o a file1.c file2.c cpp1.o file3.c -o a -lm -lstdc++
The computation time suffers a big regression. I do not see where it comes from.
djm44
Hello,
If it can help qmake from Qtcreator LTS 6.2.4 does not work with clang 15.0.. You probably have to use clang 13.0