Proper Way of Setting the Cross Module Optimization

Hello,

I have been developing a Swift package and importing it into other projects. When I import the package into a project and check the execution times of some package functions, I notice slow execution times. However, if I add the source files of the package directly into the project and run the same test functions, then the whole module optimization achieves amazing run times, which are tens of times faster. The package heavily uses generics, and the project is a command line project.

I would like to ask if there is a proper way to edit project build settings and the package.swift so that the compiler performs cross-module optimization and achieves hopefully similar run times with the whole-module optimization case.

Some previous posts indicated the use of -cross-module-optimization and SWIFT_CROSS_MODULE_OPTIMIZATION, and I tried them in numerous places in the project target build settings with no success. I also added this Swift setting to the package.swift:

            swiftSettings: [
                .unsafeFlags(["-cross-module-optimization"],
                .when(configuration: .release)) ]

but it didn't lead to faster run times. I appreciate any comments/help. Thanks.

Adding the @inlinable attribute to library methods turns out to be the most important step to activate more aggressive compiler optimization across modules, some of which are placed in Swift packages. A Vector struct in the library (Swift package) uses this one-liner method very frequently:

    // @inline(__always)
    @inlinable
    mutating func elem(_ k: Int, _ num: Grain) {
        _array[istart + k*stride] = num
    }

Replacing @inline(__always) with @inlinable solved the problem, and the running times of a simple test function, which increments a collection of 1000 random integers, reduced from 92 microseconds to 3 microseconds. This is quite surprising to me, as inlining this simple line sounds like a good option. However, it turns out that the compiler does a much better job with possibly having more flexibility with the @inlinable option.

Before solving the problem as described above, after posting this question yesterday, I started writing a simple package and a project importing it to reproduce the problem. I used the default Package.swift file and the default macOS command line project settings. Here is the package side code:

import Foundation

public struct Vector<Element> where Element: Numeric {
    var data: [Element]
    

    public init(_ data: [Element]) {
        self.data = data
    }
    
    
    public var count: Int {
        data.count
    }
    
    
    public subscript(k: Int) -> Element {
        get {
            return data[k]
        }
        set(num) {
            data[k] = num
        }
    }
    
    
//    @inlinable
    @inline(__always)
    public static func +(left: Self, right: [Element]) -> Self {
        var y = Array<Element>(repeating: 0, count: left.count)
        for k in 0..<left.count {
            y[k] = left[k] + right[k]
        }
        return Self(y)
    }
}

And the project-side code:

import Foundation
import SimplePackage

let N = 10000
let vArray = (1...N).map { _ in Int.random(in: 0...10) }
let v = Vector<Int>(vArray)
let w = (1...N).map { _ in Int.random(in: 0...10) }

let start = DispatchTime.now()
let y = v + w
let end = DispatchTime.now()

let nanoseconds = end.uptimeNanoseconds - start.uptimeNanoseconds
print("\(y[0] + y[5] + y[10])")
print("Execution time: \(nanoseconds/1000) microsecs")

With inline(__always), my computer prints out 445 microseconds, which is the shortest runtime out of at least 10 runs. When I remove inline(__always) (no use of attributes), I get similar times but the shortest one was 413 microseconds. If I use the @inlinable attribute for the plus operator, then I get 12 microseconds on an M3 Max MacBook Pro machine.

Proper Way of Setting the Cross Module Optimization
 
 
Q