Adding the @inlinable
attribute to library methods turns out to be the most important step to activate more aggressive compiler optimization across modules, some of which are placed in Swift packages. A Vector
struct in the library (Swift package) uses this one-liner method very frequently:
// @inline(__always)
@inlinable
mutating func elem(_ k: Int, _ num: Grain) {
_array[istart + k*stride] = num
}
Replacing @inline(__always)
with @inlinable
solved the problem, and the running times of a simple test function, which increments a collection of 1000 random integers, reduced from 92 microseconds to 3 microseconds. This is quite surprising to me, as inlining this simple line sounds like a good option. However, it turns out that the compiler does a much better job with possibly having more flexibility with the @inlinable
option.
Before solving the problem as described above, after posting this question yesterday, I started writing a simple package and a project importing it to reproduce the problem. I used the default Package.swift
file and the default macOS command line project settings. Here is the package side code:
import Foundation
public struct Vector<Element> where Element: Numeric {
var data: [Element]
public init(_ data: [Element]) {
self.data = data
}
public var count: Int {
data.count
}
public subscript(k: Int) -> Element {
get {
return data[k]
}
set(num) {
data[k] = num
}
}
// @inlinable
@inline(__always)
public static func +(left: Self, right: [Element]) -> Self {
var y = Array<Element>(repeating: 0, count: left.count)
for k in 0..<left.count {
y[k] = left[k] + right[k]
}
return Self(y)
}
}
And the project-side code:
import Foundation
import SimplePackage
let N = 10000
let vArray = (1...N).map { _ in Int.random(in: 0...10) }
let v = Vector<Int>(vArray)
let w = (1...N).map { _ in Int.random(in: 0...10) }
let start = DispatchTime.now()
let y = v + w
let end = DispatchTime.now()
let nanoseconds = end.uptimeNanoseconds - start.uptimeNanoseconds
print("\(y[0] + y[5] + y[10])")
print("Execution time: \(nanoseconds/1000) microsecs")
With inline(__always)
, my computer prints out 445 microseconds, which is the shortest runtime out of at least 10 runs. When I remove inline(__always)
(no use of attributes), I get similar times but the shortest one was 413 microseconds. If I use the @inlinable
attribute for the plus operator, then I get 12 microseconds on an M3 Max MacBook Pro machine.