Generating Psuedorandom numbers very quickly using Swift?

Hello,

There is some Metal sample code that is written in Objective-C (https://developer.apple.com/documentation/metal/performing_calculations_on_a_gpu). I wanted to implement this in Swift, but am running into a major performance issue.

The sample code fills two large buffers with pseudorandom data, and does so very quickly (a fraction of a second for ~14 million elements) using the rand() function. In my Swift version, I have tried many methods for generating data, but they all take between 6 and 10 seconds for the same ~14 million elements (the rand() function is not available in Swift) on my M1 Pro.

Surely there must be some method in Swift that can approximate the general speed of rand(). I'm more than willing to trade randomness for speed!

Any ideas?

Answered by 13fpl in 726221022

Good idea about changing the Obj-C code to use arc4random() instead of rand(). I did so, and have found that for the 16.7 million floats (1 << 24) in Apple's sample code, rand() takes about 0.13 second, while arc4random() takes about 1.25 seconds.

I tried a few more things with the Swift code that have produced some very interesting results. Switching from for in to a while < loop drastically reduces the execution time.

let randomRange: ClosedRange<Float> = 0...Float(100.0)
let arrayLength = (1 << 24)

var buffer: [Float] = Array(repeating: 0.0, count: arrayLength)
var idx: Int = 0
while idx < arrayLength {
    buffer[idx] = Float.random(in: randomRange)
    idx += 1
}

The loop in the above code runs in 5.12 seconds - less than half the time that for idx in 0..<arrayLength takes!

This promising result led me back to using GameplayKit.

let arrayLength = (1 << 24)
let randomSource = GKLinearCongruentialRandomSource()
var buffer: [Float] = Array(repeating: 0.0, count: arrayLength)

var idx: Int = 0
while idx < arrayLength {
    buffer[idx] = Float(randomSource.nextInt())/Float(RAND_MAX)
    idx += 1
}

The loop in the above code runs in 0.86 seconds!

This is still significantly slower than rand(), but I'm fine with it. I may look more into other implementations in the future.

I appreciate the feedback and suggestions. It really helped get me thinking.

Could you show the code where you generate the array ?

Sure, I will post a few snippets. These are all on an M1 Pro.

Creating an array of data for later passing in to the Metal buffer:

let arrayLength = (1 << 24)

let randomRange: ClosedRange<Float> = 0...Float(100.0)
var randomFloats: [Float] = []
randomFloats.reserveCapacity(arrayLength)

print("Capacity reserved")

let start = DispatchTime.now()

for _ in 0..<arrayLength {
    randomFloats.append(Float.random(in: randomRange))
}

let end = DispatchTime.now()
let totalTime = Double(end.uptimeNanoseconds - start.uptimeNanoseconds) /  1_000_000_000.0
print("Total time \(totalTime) seconds.")

The output for this one is:

Capacity reserved
Total time 11.37157275 seconds.
Program ended with exit code: 0

Here we try to create the Metal buffer directly:

guard let device = MTLCreateSystemDefaultDevice() else {
    fatalError( "Failed to get the system's default Metal device." )
}

let arrayLength = (1 << 24)
let bufferSize = arrayLength * MemoryLayout<Float>.size
let randomRange: ClosedRange<Float> = 0...Float(100.0)
print("Starting")


let start = DispatchTime.now()

guard let buffer = device.makeBuffer(bytes: (0..<arrayLength).map { _ in Float.random(in: randomRange) },
                                     length: bufferSize,
                                     options: .storageModeShared) else {
    fatalError( "Failed to make buffer" )
}

let end = DispatchTime.now()
let totalTime = Double(end.uptimeNanoseconds - start.uptimeNanoseconds) /  1_000_000_000.0
print("Total time \(totalTime) seconds.")

The output for this one is:

2022-09-03 08:43:27.183968-0500 computeTest[2988:125455] Metal GPU Frame Capture Enabled
2022-09-03 08:43:27.184253-0500 computeTest[2988:125455] Metal API Validation Enabled
Starting
Total time 10.732446667 seconds.
Program ended with exit code: 0

I've tried a number of others, but this should give a good idea of what is going on. Using GKLinearCongruentialRandomSource from GameKit speeds it up by a few percent, but it still doesn't compare to the Objective-C version in the above linked sample code. That entire program runs in less than 1 second on my MacBook Pro.

Try changing the C code to use arc4random() instead of rand(), and see what the slowdown is. If C-using-arc4random() is similar to the speed of the swift code, then I think we can say that the speed difference is because Swift's float.random is using the arc4random algorithm, or similar. But if the C code is still faster, there must be other issues involved.

As for how to fix your problem - you can, of course, call the C rand() function from Swift using suitable bridging. Do be aware of the well-known limitations of rand() if you choose to do this.

If I were trying to generate lots of pseudo-random numbers very fast, I'd search for SIMD (NEON) code to do it.

Accepted Answer

Good idea about changing the Obj-C code to use arc4random() instead of rand(). I did so, and have found that for the 16.7 million floats (1 << 24) in Apple's sample code, rand() takes about 0.13 second, while arc4random() takes about 1.25 seconds.

I tried a few more things with the Swift code that have produced some very interesting results. Switching from for in to a while < loop drastically reduces the execution time.

let randomRange: ClosedRange<Float> = 0...Float(100.0)
let arrayLength = (1 << 24)

var buffer: [Float] = Array(repeating: 0.0, count: arrayLength)
var idx: Int = 0
while idx < arrayLength {
    buffer[idx] = Float.random(in: randomRange)
    idx += 1
}

The loop in the above code runs in 5.12 seconds - less than half the time that for idx in 0..<arrayLength takes!

This promising result led me back to using GameplayKit.

let arrayLength = (1 << 24)
let randomSource = GKLinearCongruentialRandomSource()
var buffer: [Float] = Array(repeating: 0.0, count: arrayLength)

var idx: Int = 0
while idx < arrayLength {
    buffer[idx] = Float(randomSource.nextInt())/Float(RAND_MAX)
    idx += 1
}

The loop in the above code runs in 0.86 seconds!

This is still significantly slower than rand(), but I'm fine with it. I may look more into other implementations in the future.

I appreciate the feedback and suggestions. It really helped get me thinking.

Switching from for in to a while < loop drastically reduces the execution time.

The other thing you're doing there is changing from .append to [idx].

Do you have optimisation enabled? Any other build settings to fiddle with?

That's a good observation! I had done many more tests than shown, the switch happened off-screen ;)

But you know, now that we have found the most dramatic speedup, it's worth revisiting. In the above code, the fastest I was able to achieve was 0.86 seconds (using direct indexing of the buffer). Going back to .append() increases the speed again!

let arrayLength = (1 << 24)
let randomSource = GKLinearCongruentialRandomSource()

var buffer: [Float] = []
buffer.reserveCapacity(arrayLength)

var idx: Int = 0
while idx < arrayLength {
    buffer.append(Float(randomSource.nextInt())/Float(RAND_MAX))
    idx += 1
}

Gives me a loop time of 0.78 seconds. That's a 9% reduction in execution time! This is only possible because I am reserving capacity in the array before starting the loop.

As for build settings and optimizations, I created a new Xcode project for the Swift version and therefore got the default settings. I left the Apple sample code project with whatever Apple had set as the defaults. No fiddling on my part.

Generating Psuedorandom numbers very quickly using Swift?
 
 
Q