As part of a hobby project, I'm working on a 2D game engine that will draw each pixel every frame, using a color from a palette. I am looking for a way to do that while maintaining a reasonable frame rate (60fps being the minimum).
Without any game-logic in place, I am updating the values of my pixels with some value form the palette. I'm currently taking the mod of an index, to (hopefully) prevent the compiler from doing some loop-optimisation it could do with a fixed value.
My very naive implementation of updating the bytes in the pixel array goes like this. On an iPhone 12 Pro, each run of updating all pixel values takes on average 43 ms, while on a simulator running on an M1 mac, it takes 15 ms. Both unacceptable, as that would leave not for any additional game logic (which would be much more operations than taking the mod of an Int).
I was planning to look into Metal and set up a surface, but clearly the bottleneck here is the CPU, so if I can optimize this code, I could go for a higher-level framework.
Any suggestions on a performant way to write this many bytes much, much faster (parallelisation is not an option)?
struct BGRA
{
let blue: UInt8
let green: UInt8
let red: UInt8
let alpha: UInt8
}
let BGRAPallet =
[
BGRA(blue: 124, green: 124, red: 124, alpha: 0xff),
BGRA(blue: 252, green: 0, red: 0, alpha: 0xff),
// ... 62 more values in my code, omitting here for brevity
]
private func test()
{
let pixelBufferPtr = UnsafeMutableBufferPointer<BGRA>.allocate(capacity: screenWidth * screenHeight)
let runCount = 1000
let start = Date.now
for _ in 0 ..< runCount
{
for index in 0 ..< pixelBufferPtr.count
{
pixelBufferPtr[index] = BGRAPallet[index % BGRAPallet.count]
}
}
let elapsed = Date.now.timeIntervalSince(start)
print("Average time per run: \((Int(elapsed) * 1000) / runCount) ms")
}
Running this in an optimised build will change the numbers signifcantly. Filling the frame AND generating a CGImage from those raw bytes takes 16 µs (average of a couple of hundred runs)! The difference between an optimized build and a debug build is quite staggering (to me). Of course that is why profiling gets done on release (i.e. optimized) builds.