There are some related posts on the forums that seem inconclusive and also a bit more complex. I pared my testing down to a couple of brief XCTests with self.measure blocks on repeated add and multiply operations of two double values. The tests set random initial values to ensure there's no compiler optimization of loop calculations based on constants. There's also no big collection of fixture data, so there's no chance allocations or vector index dereference or similar issues could be involved.
The regular multiple-instruction code runs an order of magnitude faster than the SIMD code! I don't understand why this is - would the SIMD code be faster in C++, could it be some Swift conversion? Or is there some aspect of my SIMD code that is incurring some known penalty? Curious if anyone out there is using SIMD in Swift in production and if you see anything in my test code that explains the difference.
Code Block Swift func testPerformance_double() { var xL = Double.random(in: 0.0...1.0) var yL = Double.random(in: 0.0...1.0) let xR = Double.random(in: 0.0...1.0) let yR = Double.random(in: 0.0...1.0) let increment = Double.random(in: 0.0...0.1) Swift.print("xL: \(xL), xR: \(xR), increment: \(increment)") var result: Double = 0.0 self.measure { for _ in 0..<100000 { result = xL + xR result = yL + yR result = xL * xR result = yL * yR xL += increment yL += increment } } Swift.print("last result: \(result)") // read from result }
Code Block Swift func testPerformance_simd() { var vL = simd_double2(Double.random(in: 0.0...1.0), Double.random(in: 0.0...1.0)) let vR = simd_double2(Double.random(in: 0.0...1.0), Double.random(in: 0.0...1.0)) let increment = Double.random(in: 0.0...0.1) let vIncrement = simd_double2(increment, increment) var result = simd_double2(0.0, 0.0) Swift.print("vL.x: \(vL.x), vL.y: \(vL.y), increment: \(increment)") self.measure { for _ in 0..<100000 { result = vL + vR result = vL * vR vL = vL + vIncrement } } Swift.print("last result: \(String(describing: result))") }
The measurements show the block with SIMD operations taking an order of magnitude more time than the multiple operations!
...testPerformance_double measured [Time, seconds] average: 0.049, relative standard deviation: 3.059%, values: [0.049262, 0.049617, 0.048499, 0.047859, 0.048270, 0.048564, 0.047529, 0.052578, 0.047267, 0.047432], performanceMetricID:com.apple.XCTPerformanceMetric_WallClockTime, baselineName: "", baselineAverage: , maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.100, maxStandardDeviation: 0.100
...testPerformance_simd measured [Time, seconds] average: 0.579, relative standard deviation: 5.932%, values: [0.626196, 0.605790, 0.635180, 0.611197, 0.553179, 0.548163, 0.552648, 0.549264, 0.552745, 0.551465], performanceMetricID:com.apple.XCTPerformanceMetric_WallClockTime, baselineName: "", baselineAverage: , maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.100, maxStandardDeviation: 0.100
XCTest performance tests can work great for benchmarking and investing alternate implementations even with micro performance, but the trick is to make sure you're not testing code built for debug or running under debugging.
I now have XCTest running the performance tests from my original post and showing meaningful (and actionable) results. On my current machine, the 100000 regular Double calculation block has an average measurement of 0.000328 s, while the simd_double2 test block has an average measurement of 0.000257 s, which is about 78% of the non-SIMD time, very close to the difference I measured in my release build. So now I can reliably measure what performance gains I'll get from SIMD and other Accelerate APIs as I decide whether to adopt.
Here's the approach I recommend:
Put all of your performance XCTests in separate files from functional tests, so you can have a separate target compile them.
Create a separate Performance Test target in the Xcode project. If you already have a UnitTest target, it's easy just to duplicate it and rename.
Separate your tests between these targets, with the functional tests only in the original Unit Test target, and the performance tests in the Performance Test target.
Create a new Performance Test Scheme associated with the Performance Test Target.
THE IMPORTANT PART: Edit the Performance Test Scheme, Test action, and set its Build Configuration to Release, uncheck Debug Executable, and uncheck everything under Diagnostics. This will make sure that when you run Project->Test, it's Release-optimized code that's getting run for your performance measurements.
The problem now is that your main app's scheme only has one setting for test configuration (Debug vs. Release), so assuming it's set to Debug when you run your performance test ad hoc it will display the behavior in my OP, with SIMD code especially orders of magnitude slower.
I do want my main app's test configuration to remain Debug for working with functional unit test code. So to make performance tests work tolerably in this scenario, I edited the build settings of the Performance Test target (only) so that it's Debug settings were more like Release - the key setting being Swift Compiler Code Generation, changing Debug to Optimize for Speed [-O]. While I don't think this is going to be quite as accurate as running under the Performance Test scheme with Release configuration and all other debug options disabled, I'm now able to run the performance test under my main app's scheme and see reasonable results - it again shows SIMD time measurement in the 75-80% range compared to non-SIMD for the test in question.