Post

Replies

Boosts

Views

Activity

Which instructions are included in the hardware event `INST_SIMD_ALU`?
I asked this on StackOverflow too, but did not get a response. Copying verbatim (images might not work as expected). Short question: which instructions other than floating point arithmetic instructions like fmul, fadd, fdiv etc are counted under the hardware event INST_SIMD_ALU in XCode Instruments? Alternatively, how can I count the number of floating point operations in a program using CPU counters? I want to measure/estimate the FLOPs count of my program and thought that CPU counters might be a good tool for this. The closest hardware event mnemonic that I could find is INST_SIMD_ALU, whose description reads. Retired non-load/store Advanced SIMD and FP unit instructions So, as a sanity check I wrote a tiny Swift code with ostensibly predictable FLOPs count. let iterCount = 1_000_000_000 var x = 3.1415926 let a = 2.3e1 let ainv = 1 / a // avoid inf for _ in 1...iterCount { x *= a x += 1.0 x -= 6.1 x *= ainv } So, I expect there to be around 4 * iterCount = 4e9 FLOPs. But, on running this under CPU Counters with the event INST_SIMD_ALU I get a count of 5e9, 1 extra FLOP per loop iteration than expected. See screenshot below. dumbLoop is the name of the function that I wrapped the code in. Here is the assembly for the loop +0x3c fmul d0, d0, d1 <---------------------------------- +0x40 fadd d0, d0, d2 | +0x44 fmov d4, x10 | +0x48 fadd d0, d0, d4 | +0x4c fmul d0, d0, d3 | +0x50 subs x9, x9, #0x1 | +0x54 b.ne "specialized dumbLoop(_:initialValue:)+0x3c" --- Since it's non-load/store instructions, it shouldn't be counting fmov and b.ne. That leaves subs, which is an integer subtraction instruction used for decrementing the loop counter. So, I ran two more "tests" to see if the one extra count comes from subs. On running it again with CPU Counters with the hardware event INST_INT_ALU, I found a count of one billion, which adds up with the number of loop decrements. Just to be sure, I unrolled the loop by a factor of 4, so that the number of loop decrements becomes 250 million from one billion. let iterCount = 1_000_000_000 var x = 3.1415926 let a = 2.3e1 let ainv = 1 / a // avoid inf let n = Int(iter_count / 4) for _ in 1...n { x *= a x += 1.0 x -= 6.1 x *= ainv x *= a x += 1.0 x -= 6.1 x *= ainv x *= a x += 1.0 x -= 6.1 x *= ainv x *= a x += 1.0 x -= 6.1 x *= ainv } print(x) And it adds up, around 250 million integer ALU instructions, and the total ALU instructions is 4.23 billion, somewhat short of the expected 4.25 billion. So, at the moment if I want to count the FLOPs in my program, one estimate I can use is INST_SIMD_ALU - INST_INT_ALU. But, is this description complete, or are there an other instructions that I might spuriously count as floating point operations? Is there a better way to count the number of FLOPs?
0
0
506
Apr ’24