AVAudioPlayerNode.play() performance

While investigating some performance issues in my game app I found that AVAudioPlayerNode.play() is taking a long time to complete, sometimes close to 20 milliseconds (more than the time allotted for a single frame at 60 FPS).


Currently I'm disconnecting and reconnecting player nodes with each play event to ensure they're connected with the proper format for the sound being played. Below is some example code (it uses Xcode's iOS game template and replaces the code in GameViewController.swift):


import AVFoundation
import GLKit
import OpenGLES
import QuartzCore

final class GameViewController: GLKViewController {
    private let engine = AVAudioEngine()
    private let player = AVAudioPlayerNode()
    private var context: EAGLContext?
    private var buffer: AVAudioPCMBuffer?

    override func viewDidLoad() {
        // General set up.
        super.viewDidLoad()
        context = EAGLContext(api: .openGLES2)
        let view = self.view as! GLKView
        view.context = context!

        // Load audio buffer.
        let path = Bundle.main.path(forResource: "test.wav", ofType: nil)!
        let url = URL(fileURLWithPath: path)
        do {
            let file = try AVAudioFile(forReading: url)
            buffer = AVAudioPCMBuffer(
                pcmFormat: file.processingFormat,
                frameCapacity: AVAudioFrameCount(file.length)
            )
            try file.read(into: buffer!)
            print(buffer!.format)
        } catch {
        }
     
        // Set up and start audio engine.
        engine.attach(player)
        engine.connect(player, to: engine.mainMixerNode, format: buffer!.format)
        do {
            try engine.start()
        } catch {
        }
    }

    override func glkView(_ view: GLKView, drawIn rect: CGRect) {
        glClear(GLbitfield(GL_COLOR_BUFFER_BIT))
    }

    override func touchesBegan(_ touches: Set<UITouch>, with event: UIEvent?) {
        // Play sound and time call to play().
        engine.disconnectNodeOutput(player)
        engine.connect(player, to: engine.mainMixerNode, format: buffer!.format)
        let startTime = CACurrentMediaTime()
        player.play()
        print(CACurrentMediaTime() - startTime)
        player.scheduleBuffer(buffer!)
    }
}


I'm testing this on an iPad Mini running iOS 9.3.5. The format of the test audio file is mono, 44.1, Float32.


I can think of three possibilities here:


- I'm doing something wrong that's causing play() to execute slowly.

- play() isn't meant to be used in a real-time context, but rather only during setup.

- There's some issue with the AV audio framework that's causing the behavior.


Can anyone help clarify this? Are these long execution times (e.g. ~20 ms) expected behavior for the play() function?

Replies

Seems like excessive complexity -- why do you need to use an audio engine? That's really only for audio processing, and it looks like you're just playing audio. For good class design, your sounds should be associated with their corresponding game elements. Then you can speed up their initial play by loading them from your wav URL, issuing the prepareToPlay method, then the play() will start quickly. You can also change the currentTime of the playback, then issue the pause() command, which will cause the next play() to execute very fast.

First, let me say thanks for responding. I appreciate your taking the time to do so.


However, I must admit I don't quite understand your reply. You asked why I'm using an audio engine - I'm not sure what you mean by this. Are you saying I shouldn't be using the AVAudioEngine framework? If so, what alternative did you have in mind?


As for your comment about class design and associating sounds with game elements, I'm not sure if that's right in the general case. Based on what I've seen elsewhere and done in the past, typically a game sound system will have some number of channels available, and sound effects will be played on the first available channel. This is the way I used to do it with OpenAL, and I don't recall ever dealing with these kinds of issues with that API.


Maybe what you're saying though is that this approach doesn't really work with AVAudioEngine and isn't how AVAudioEngine is intended to be used. If the only option is to keep a dedicated AVAudioPlayerNode instance pre-warmed and ready to go for every sound you might want to play, then that's certainly what I'll do. It does seem counterintuitive though. In my experience channels and sounds are orthogonal (for example, a game might have 16 channels but 50 sounds effects), but it sounds like you're saying with AVAudioEngine there needs to be a 1-to-1 correspondence between channels and sounds?


Again, thanks for responding to my post. If you feel so inclined, I'd certainly be interested in any clarification you could offer with respect to your earlier reply.


Edit: After re-reading your reply, based on your mention of prepareToPlay() it sounds like maybe you're recommending I use AVAudioPlayer rather than AVAudioEngine. If so, I'm not sure that's current best practice for game audio in iOS. Apple used to recommend OpenAL for low-latency applications such as games, and my understanding is that AVAudioEngine is intended to supersede OpenAL. Everything else I've seen and read (including the WWDC presentation on AVAudioEngine) seems to suggest that AVAudioEngine is currently the recommended solution for game audio in iOS. (If I'm wrong about any of this, I hope someone will let me know.) In any case, the question remains open.

Just posting a follow-up to this in case anyone else encounters the same issue and finds this thread.


The aforementioned caveats still apply. I haven't tested this in iOS 10, and maybe this behavior has changed since 9.3.5. It's also possible I'm making a mistake in my code, but if so I haven't been able to spot it.


With those caveats in mind, my provisional conclusion is that, although AVAudioEngine technically supports changing node configurations dynamically during rendering, this functionality isn't intended for real-time use (e.g. in a game application during gameplay). Connecting a player node and calling play() seems to incur a cost (approaching 20 milliseconds in some cases) that isn't compatible with a real-time simulation.


The solution I ended up implementing was to create a fixed number of mono and stereo channels up front (each of which uses a single AVAudioPlayerNode instance) and never change anything after that. It's not as flexible or elegant as I'd like, but it does solve the performance problem, as playback on a pre-existing player node with no configuration changes seems to have no performance issues.


Edit: I was finally able to test this in 10.2, and it doesn't look like this behavior has changed.