Post

Replies

Boosts

Views

Activity

Reply to Proper Way of Setting the Cross Module Optimization
Adding the @inlinable attribute to library methods turns out to be the most important step to activate more aggressive compiler optimization across modules, some of which are placed in Swift packages. A Vector struct in the library (Swift package) uses this one-liner method very frequently: // @inline(__always) @inlinable mutating func elem(_ k: Int, _ num: Grain) { _array[istart + k*stride] = num } Replacing @inline(__always) with @inlinable solved the problem, and the running times of a simple test function, which increments a collection of 1000 random integers, reduced from 92 microseconds to 3 microseconds. This is quite surprising to me, as inlining this simple line sounds like a good option. However, it turns out that the compiler does a much better job with possibly having more flexibility with the @inlinable option. Before solving the problem as described above, after posting this question yesterday, I started writing a simple package and a project importing it to reproduce the problem. I used the default Package.swift file and the default macOS command line project settings. Here is the package side code: import Foundation public struct Vector<Element> where Element: Numeric { var data: [Element] public init(_ data: [Element]) { self.data = data } public var count: Int { data.count } public subscript(k: Int) -> Element { get { return data[k] } set(num) { data[k] = num } } // @inlinable @inline(__always) public static func +(left: Self, right: [Element]) -> Self { var y = Array<Element>(repeating: 0, count: left.count) for k in 0..<left.count { y[k] = left[k] + right[k] } return Self(y) } } And the project-side code: import Foundation import SimplePackage let N = 10000 let vArray = (1...N).map { _ in Int.random(in: 0...10) } let v = Vector<Int>(vArray) let w = (1...N).map { _ in Int.random(in: 0...10) } let start = DispatchTime.now() let y = v + w let end = DispatchTime.now() let nanoseconds = end.uptimeNanoseconds - start.uptimeNanoseconds print("\(y[0] + y[5] + y[10])") print("Execution time: \(nanoseconds/1000) microsecs") With inline(__always), my computer prints out 445 microseconds, which is the shortest runtime out of at least 10 runs. When I remove inline(__always) (no use of attributes), I get similar times but the shortest one was 413 microseconds. If I use the @inlinable attribute for the plus operator, then I get 12 microseconds on an M3 Max MacBook Pro machine.
Apr ’24