Hello,
If it can help qmake from Qtcreator LTS 6.2.4 does not work with clang 15.0.. You probably have to use clang 13.0
Post
Replies
Boosts
Views
Activity
Hi,
No I can't give all the codes of my applications. To give an idea, with and without Eigen library it is about factor analysis. I made C and C++ versions. The tests compute the factors of 64 questions of a survey . It gives a square matrix of 306 items . The method used to extract the factors is algorithmic.
On MacBook pro Intel 2020 the total computation in C++ or C gives about 10 seconds with clang 13.x but 54 seconds with clang.15 .
For C++ as I mentionned the complier option are -std=c++17 -Ofast -march=native -funroll-loops -flto -DNDEBUG
For the C version idem
gcc -Ofast -march=native -funroll-loops -flto -DBEBUG -o a file1.c file2.c cpp1.o file3.c -o a -lm -lstdc++
The computation time suffers a big regression. I do not see where it comes from.
djm44
I'am afraid the algorithmic portion has nothing to do with the informations I gave to you. I can confirm in this sense. And more on my mac I have two virtual machines for simple use , vmware and virtualbox. The same code I mentionned earlier runs 12 seconds on vimware and 13 seconds on virtualbaox knowing that these two virtual machine use a lot less ressources than the host MacOs. I precise the regression appears with clang 15.0 not clang 13.x. . So 5 times slower for g++ clang 15.0
if you want a part of computational code here :
template < typename T >
int OAnacorr<T>::compute_afc_for_burt()
{
size_t n = 0, i = 0 ;
T AKSI, ALAMBDA, PHI2, perc = 0 ;
T CUMUL ;
long TEST0, TEST1 ;
T TOTA = M.sum() ;
KT VL, VC ;
VC.resize( M.getnc() );
VL.resize( M.getnl() );
Mtype M2( M.getnl(), M.getnc() ) ;
KT PII ;
M.peek_sum_rows( PII ) ;
M.to_percent();
KT PJ ;
M.peek_sum_cols( PJ );
Mtype K2( M ) ;
KT KI( M.getnc() ) ;
CUMUL = 0.0 ;
PHI2 = K2.khi_deux_pond() ;
if( Mtype::isnan(PHI2) || PHI2 <= 0 )
return 1;
K2.peek_sum_cols( KI ) ;
KT VVI( KI ) ;
KT SVI( KI ) ;
Mtype TH = M.get_theoric() ;
/*****************************************/
KT VCSUP ;
KT KSUP ;
KT SKI ;
KT A, D ;
if( g_xsup > 0 )
{
VCSUP.resize( TSC.getnl());
KSUP.resize( TSC.getnl() );
A.resize ( VCSUP.size() );
D.resize ( VCSUP.size() ) ;
TSC.peek_sum_rows( KSUP );
Mtype STHEO ( TSC.getnl(), TSC.getnc() ) ;
for( size_t i = 0 ; i < TSC.getnl(); i++ )
for( size_t j = 0; j < TSC.getnc(); j++ )
STHEO(i,j) = (PII[j] * KSUP[i]) / TOTA ;
Mtype ECA = TSC - STHEO;
Mtype SKI2 = ( ECA * ECA ) / STHEO ;
SKI2 /= TOTA ;
SKI2.peek_sum_rows( SKI );
for( size_t i = 0; i < VCSUP.size(); i++ )
D[i] = KSUP[i] / TOTA ;
TSC.to_percent();
vect_to_percent( KSUP );
inth( TSC, KSUP, PJ, 1.0 );
}
os << "\nAnalyse des correspondances (AFC)" << std::endl << std::endl ;
os << "Phi Deux = " << std::setw(8) << std::setprecision(6) << std::fixed << PHI2 << std::endl;
for ( n = 0 ; n < g_nbf ; n++ )
{
if( ( 100.00 - CUMUL ) < 0.0000001 ) goto tend ;
vect_zero( M, VL ) ;
i=0 ;
do
{
TEST0 = (VL[0] * 10000000.0) ;
prod_by_cols( M, VC, VL ) ;
AKSI = reduce_by_pond( PJ, VC) ;
i++;
if( i > 20000 )
{
return 1 ;
}
prod_by_rows( M, VL, VC ) ;
AKSI = reduce_by_pond( PJ, VL ) ;
TEST1 = (VL[0] * 10000000.0) ;
}
while ( TEST1 != TEST0 );
if( g_xsup > 0 )
{
prod_by_rows( TSC, VCSUP, VC );
T RX = reduce_by_pond( KSUP, VCSUP );
mul_vect( VCSUP, RX );
for ( size_t i = 0 ; i < VCSUP.size(); i++ )
if( Mtype::isnan(VCSUP[i])) VCSUP[i] = 0 ;
WSUP.push_back( VCSUP );
}
mul_vect( VC, AKSI );
WWC.push_back( VC ) ;
ALAMBDA = ( n != 0 ) ? AKSI * AKSI : PHI2 ;
rebuild_pond( M2, VC, PJ, AKSI ) ;
M -= M2 ;
if( n == 0 )
{
WWC.push_back( PJ );
if( g_xsup > 0 )
{
WSUP.push_back(VCSUP);
}
}
if( n != 0 )
{
mul_and_div( M2, TH );
M2.peek_sum_cols( KI ) ;
SVI = KI ;
div_vect( SVI, VVI );
WWC.push_back( SVI );
perc = (ALAMBDA / PHI2) * 100 ;
CUMUL += perc ;
g_nbvectors += 1 ;
if( g_xsup > 0 )
{
for( size_t i = 0 ; i < VCSUP.size(); i++ )
{
A[i] = VCSUP[i] * VCSUP[i] * D[i] / SKI[i] ; // cos2
if ( Mtype::isnan(A[i])) A[i] = 0 ;
if( A[i] >= 1 ) A[i] = 0.999;
}
WSUP.push_back(A);
}
std::ostringstream ox ;
ox << "F" << n ;
os << std::setw(5) << std::setfill(' ') << std::left << ox.str()
<< " Val Propre = "
<< std::setw(8) << std::setprecision(6) << std::fixed << ALAMBDA
<< " Pourcent= " << std::setw(5) << std::setprecision(2) << std::right << std::fixed << perc
<< " Cumulé= " << std::setw(6) << std::setprecision(2) << std::right << CUMUL
<< " Nb iter= "
<< std::setw(5) << std::right << ((n>0) ? i : i) << std::endl ;
}
div_vect( KI, ALAMBDA );
WWC.push_back( KI);
if( g_xsup > 0 )
{
for( size_t i = 0 ; i < VCSUP.size(); i++ )
{
A[i] = VCSUP[i] * VCSUP[i] * D[i] / ALAMBDA ; // cpf
if ( Mtype::isnan(A[i])) A[i] = 0 ;
if( A[i] >= 1 ) A[i] = 0.999;
}
WSUP.push_back(A);
}
}
tend:
g_nbf = n ;
os << std::endl;
return 0;
}
djm44
KT is std::vector < double >
Mtype is std::vector < double > with different subscript
T is double
There no assembler instructions
-03 makes better than -Ofast 3 times slower instead of 5 times slower. You pointed a right thing.
It seems the -Ofast option does not work any more My opinion there is a security addon which blocks the code , something that blocks the memory access or that
verify systematically array access or array bounds ? Or some new default options that makes code slower ?
The codes of the two versions of my code in C and C++ has been tested with valgrind on Linux and gives no errors ( memory, and array bounds ).
I am not used to use mailing list. I read the release notes of Apple clang 15.0 . It's very bulky. I did not notice any
thing about the changes in the -Ofast command option.
What are the specific flags added to -O3 in -Ofast ?
For my test -Ofast was compatible with the computations.
Hi,
You say slow compile time for Xcode which also uses command-line-tools. But did you notice a slower execution or run time ? Well if you can test .
I did.
Hi ,
Yes I compare things that are comparable . Clang 15.0 with the previous version using command line tools not Xcode. On the same macbook pro intel 2020.
My previous toochain was Command_Line_Tools_for_Xcode_14.3.1_Release_Candidate.
What I'm measuring is run time not compile time. My code has sense being fast , I do not bother about compile time.
So eliminating -march=native I get a less worse performance .
For now with different modifications I get 2 times slower than the previous clang.
Some things have changed in this last version of clang compiler that is not documented. I confirm the same application in C or C++ works about 2 times faster on the linux
guests vmware and virtualBox. And the same 2 times faster with mingw c++ on Windows Bootcamp.
The only one which lacks performance is the last Apple clang 15.0 g++ or clang .
With the previous toolchain I used :
g++ -std=c++17 -Ofast -march=native -funroll-loops -lfto -DNDEBUG -o a prog.cpp
With clang 15.0 to get less worse performance :
g++ -std=c++17 -Ofast -funroll-loops -lfto -DNDEBUG -o a prog.cpp
the perf :
On linux guests about 11 seconds
On previous clang about 10 seconds
On Windows BootCamp about 11 seconds
On last Apple clang 15.0 about 22 seconds.
Hi ,
I don't see what you mean with clang -###.
I won't try all command line options.
I am not able to search in disassemblies
.
On Linux it is 15.0.7 clang version but I use gnu gcc
.
What is -ld_classic?
I notice a global loss of run time performance i between
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
and
Apple clang version 15.0.0 (clang-1500.0.40.1)
And Apple made it impossible to revert to 14.0.3 command line tools.
So -O3 makes code slower than -Ofast
I tried -Wl,-ld_classic gives no difference
I notice -march=native make code very much slower.
I guess what changed with this last version is -march=native
May be less support for Intel processor
With the previous version of clang -march=native made code faster.
I don't know about assembly.
The fact is that my code and the compiler flags have not changed
but the changes are in clang 15.0 and are not documented .
Same codes run faster on Linux guests vmware and virtualbox. with gnu gcc or g++
And on Windows BootCamp with Mingw gcc g++
I precise my codes do not use graphical UI. It gives only results on the terminal.
So -O3 makes code slower than -Ofast
I tried -Wl,-ld_classic gives no difference
I notice -march=native make code very much slower.
I guess what changed with this last version is -march=native
May be less support for Intel processor
With the previous version of clang -march=native made code faster.
I don't know about assembly.
The fact is that my code and the compiler flags have not changed but the changes are in clang 15.0 and are not documented .
Same codes run faster on Linux guests vmware and virtualbox. with gnu gcc or g++ And on Windows BootCamp with Mingw gcc g++
I precise my codes do not use graphical UI. It gives only results on the terminal.
diiscard
I checked with with option -###
apple-macosx14.0.0 is invoqued is it right ? or shoud it be apple-macosx15.0.0 ?
the -march=native is effective but it makes code slower . With the previous version -march=native made the code faster.
Is the disk crypted by défault with Sonoma which could make code slower ?
sorry I better put this in a reply than a comment.
If you want to i can send you one of my examples a group of about 15 small files in a zip format.
Hi ,
the FB number is FB13252912 .
I sent a zip file wihich can run a test merely in C language The same code compiled with fastest options gives
a better run time on Linux VirtualBox vmWare guests and Windows Boot Camp .
You'll see difference by comparing clang 15 vs clang 14 on a MacBook Pro Intel 2020
Impossible to re-upload so a new FB
FB13253046
Tanks for these informations
But on string manipulation with -march=native or -march=haswell I get a very poor performance
compared with Linux guests vmWare or VirtualBox or Windows BootCamp. It's allmost 2 times slower
I can send you the test.
I puted it on the same number of FB as test2.zip
FB13253046
or
FB13256895