How to populate a pixel buffer much faster?

As part of a hobby project, I'm working on a 2D game engine that will draw each pixel every frame, using a color from a palette. I am looking for a way to do that while maintaining a reasonable frame rate (60fps being the minimum).

Without any game-logic in place, I am updating the values of my pixels with some value form the palette. I'm currently taking the mod of an index, to (hopefully) prevent the compiler from doing some loop-optimisation it could do with a fixed value.

My very naive implementation of updating the bytes in the pixel array goes like this. On an iPhone 12 Pro, each run of updating all pixel values takes on average 43 ms, while on a simulator running on an M1 mac, it takes 15 ms. Both unacceptable, as that would leave not for any additional game logic (which would be much more operations than taking the mod of an Int).

I was planning to look into Metal and set up a surface, but clearly the bottleneck here is the CPU, so if I can optimize this code, I could go for a higher-level framework.

Any suggestions on a performant way to write this many bytes much, much faster (parallelisation is not an option)?

struct BGRA
{
    let blue: UInt8
    let green: UInt8
    let red: UInt8
    let alpha: UInt8
}

let BGRAPallet =
[
    BGRA(blue: 124, green: 124, red: 124, alpha: 0xff),
    BGRA(blue: 252, green: 0, red: 0, alpha: 0xff),
// ... 62 more values in my code, omitting here for brevity
]

private func test()
{
    let pixelBufferPtr = UnsafeMutableBufferPointer<BGRA>.allocate(capacity: screenWidth * screenHeight)
    let runCount = 1000
    let start = Date.now
    for _ in 0 ..< runCount
    {
        for index in 0 ..< pixelBufferPtr.count
        {
            pixelBufferPtr[index] = BGRAPallet[index % BGRAPallet.count]
        }
    }
    let elapsed = Date.now.timeIntervalSince(start)
    print("Average time per run: \((Int(elapsed) * 1000) / runCount) ms")
}
Answered by Joride in 753091022

Running this in an optimised build will change the numbers signifcantly. Filling the frame AND generating a CGImage from those raw bytes takes 16 µs (average of a couple of hundred runs)! The difference between an optimized build and a debug build is quite staggering (to me). Of course that is why profiling gets done on release (i.e. optimized) builds.

While waiting for review and approval (seriously?), let me add some relevant info:

Instruments shows that most of the time is being spent in Swifts IndexingIterator.next() function. There is quite a substantial subtree inside it, maybe there is way to reduce the time spent there?

Important to add let screenWidth: Int = 256 let screenHeight: Int = 240 `

Right now, you're running in O(n^2), and as your screen size gets larger, 1080p vs 2K vs 4k vs 8k, the lengthier your runtime becomes. Moving this to Metal or OpenGL will be the better option since GPUs are faster than CPUs. Explore changing from for loops to something recursive. The BGRAPallet and the pixel buffer ptr should also be the same size, or the first N BGRAPallets are processed, then you're left just updating the position 0 of pixalBufferPtr for another million plus cycles.

Accepted Answer

Running this in an optimised build will change the numbers signifcantly. Filling the frame AND generating a CGImage from those raw bytes takes 16 µs (average of a couple of hundred runs)! The difference between an optimized build and a debug build is quite staggering (to me). Of course that is why profiling gets done on release (i.e. optimized) builds.

I'm working on a 2D game engine that will draw each pixel every frame, using a color from a palette. I am looking for a way to do that while maintaining a reasonable frame rate (60fps being the minimum).

Do it on the GPU.

How to populate a pixel buffer much faster?
 
 
Q