Playing files simultaneously in AVAudioEngine

I'm trying to write an OS X app that is able to play a list of files simultaneously and in perfect sync. A list of files can be from 6 to...say 20 files, and the files are wav files, perhaps 40 to 80 MB each.


I thought AVAudioEngine would be a good fit for this, but now I've hit some roadblocks which is why I'm reaching out to you good people of the internet


Setup

As said, I use AVAudioEngine with a list of AVAudioPlayerNode instances.


To feed the AVAudioPlayerNodes I use a NSURL pointing to a file. That URL is used to create an AVAudioFile property.


When the user presses play, I calculate a startTime of type AVAudioTime which is some 300 ms in the future to ensure that all files are ready and started simultaneously. I then loop through my list of AVAudioPlayerNodes and call


player.scheduleFile(audioFile, atTime: startTime, completionHandler: nil)
player.play()


on each.


Problem

This works...if I run it on my shiny new Macbook Pro with a flash drive. If I run the app on an older machine with a hard disk the sound goes out of sync, I'm guessing because the 300 ms delay is too short for the AVAudioPlayerNodes to each read in the file and get ready in time.


Q: "Why don't you just increase the delay then?"


A: Because that doesn't seem like the right way to go.


Introducing buffers

I then tried to introduce AVAudioPCMBuffers for each of the files, so instead of scheduling a file, I added it to a buffer like so:


var buffer = AVAudioPCMBuffer(PCMFormat: audioFile.processingFormat, frameCapacity: audioFile.length)
audioFile.readIntoBuffer(buffer, frameCount: bufferLength, error: nil)


And in my play, I scheduled a buffer, like so:


player.scheduleBuffer(buffer, atTime: startTime, options: .allZeroes, completionHandler: nil)
player.play()


This works...but it consumes a lot of RAM. 15 wav files, each appx. 70 MB consumes about 1 GB of RAM


Question

So my question is...what to do? Is there some way that I can just use AVAudioFile and be sure that it will be buffered fast enough, even on an old hard disk? Or should I use buffers? Or can this problem not be solved with AVAudioEngine? Should I use Core Audio instead, perhaps an AUGraph? (and if so...will that solve my problem?)

Replies

I suspect the time is being spent in opening each file up, buffering data etc. (which is being done on your behalf oppose to you doing the work when you're creating, loading and scheduling your own buffers) and there might even be a difference here for the same machine but solid state vs spinning disk.


The engine player nodes are a wrapping around the ScheduledSoundPlayer audio unit, so you could use the C APIs yourself and build your own Graph (and the flexibility to maybe use the kAudioUnitSubType_AudioFilePlayer <AUAudioFilePlayer - Using the Audio File Player Audio Unit>) which may give you a little more control over the nitty gritty but 300ms to do all that work may indeed not be enough time for the number of files and hardware you're using.


Have you tried calling prepareWithFrameCount with scheduleSegment creating a start time then calling playAtStartTime?


Additionally, many developers have informally asked for an easy API to reliably start several players together or sync players but engineering tells me they haven't seen bug reports asking for these features. So please file enhancement requests for featuers <bugreport.apple.com>.

Thank you so much for your reply.


I had already been looking at AUGraph but was hoping it didn't have to come that far 😉. When you're used to AVAudioEngine and Swift AUGraph looks a bit rough...I think.


But, I've managed to hack together a program that uses AUGraph and some kAudioUnitSubType_AudioFilePlayer nodes, and so far it is working better. I can plug in my external 5400 RPM hard disk and start playing straight away and it doesn't go out of sync, which is fine. I'll use some time to convert my code from AVAudioEngine to AUGraph and see how it goes.


Regarding a feature request. Consider it done (radar 22245529)


Once again thanks

Interesting...I'm curious if you throw together something using AVAudioEngine that makes use of AVAudioPlayerNode's:


- (void)scheduleSegment:(AVAudioFile *)file startingFrame:(AVAudioFramePosition)startFrame frameCount:(AVAudioFrameCount)numberFrames atTime:(AVAudioTime * __nullable)when completionHandler:(AVAudioNodeCompletionHandler __nullable)completionHandler;


Doing a prepareWithFrameCount: first for each node then calling playAtTime:, theoretically this should do a kAudioUnitProperty_ScheduledFilePrime underneath loading up resources and getting all set to go so when it's time to play we just play at the future time which is the now and start up in sync.

Sorry for the late reply.


I tried to use


- (void)scheduleSegment:(AVAudioFile *)file startingFrame:(AVAudioFramePosition)startFrame frameCount:(AVAudioFrameCount)numberFrames atTime:(AVAudioTime * __nullable)when completionHandler:(AVAudioNodeCompletionHandler __nullable)completionHandler;


But alas...same result. The app plays for a good 20 seconds and then spinning beach ball of death for some time, after which the app resumes...but out of sync.


In the meantime I've continued my porting from AVAudioEngine to AUGraph...still seems to work, so I guess that's the way to go (I'm slowly starting to get comfortable calling Core Audio methods from Swift so that's a bonus :-))

No worries.


Did you prepare first for every AVAudioNodePlayerNode by calling prepareWithFrameCount: ??


- (void)prepareWithFrameCount:(AVAudioFrameCount)frameCount;


If yes and you still got that staggared start up (that doesn't happen when using the C API) can you please file a bug and attach your test case -- the AVAudioEngine team would like to see what's going on here. Thanks!

I did yes. In my init method I do


let frameCount: AVAudioFrameCount = AVAudioFrameCount(audioFile.length)
player.prepareWithFrameCount(frameCount)


and then in play I call


let frameCount: AVAudioFrameCount = AVAudioFrameCount(audioFile.length)
player.scheduleSegment(audioFile, startingFrame: 0, frameCount: frameCount, atTime: nil, completionHandler: nil)
player.playAtTime(atTime)


But same result. The app plays for a good 20 seconds and then spinning beach ball of death for some time, after which the app resumes...but out of sync.


I'll be happy to provide the AVAudioEngine team with an example, but I'm starting to get the feeling that my problem must be somewhere else here on my side of the fence :-).


Perhaps some threading issues with me calling time consuming methods on the AVAudioEngine from the main thread or something like that. Plus I think I have to cut out large parts of the project to narrow it down to a simplified test case. This just to say that it may take me some time to create a test case for you, but I'll give it a shot of course.


Once again, thank you for your time, I'll keep you posted when I've filled out a radar.

Hi,


I've just fill a new bug about this : 23592826


Tell us if we can expect to stay at the AVAudioEngine level or if we must discover Audio Unit magic !

Don't schedule your files (or segments or buffers) with a delayed start time - just nil - and schedule them (e.g. in a setup/prepare method) long before actually starting the player.


Then - rather delay your players at start-up...


If your engine is already running you got a @property lastRenderTime in AVAudioNode - your player's superclass - This is your ticket to 100% sample-frame accurate sync...



    AVAudioFormat *outputFormat = [playerA outputFormatForBus:0];

    const float kStartDelayTime = 0.0; // seconds - in case you wanna delay the start

    AVAudioFramePosition startSampleTime = playerA.lastRenderTime.sampleTime;

    AVAudioTime *startTime = [AVAudioTime timeWithSampleTime:(startSampleTime + (kStartDelayTime * outputFormat.sampleRate)) atRate:outputFormat.sampleRate];

    [playerA playAtTime: startTime];
    [playerB playAtTime: startTime];
    [playerC playAtTime: startTime];
    [playerD playAtTime: startTime];
    [player...



By the way - you can achieve the same 100% sample-frame accurate result with the AVAudioPlayer class...



    NSTimeInterval startDelayTime = 0.0; // seconds - in case you wanna delay the start

    NSTimeInterval now = playerA.deviceCurrentTime;

    NSTimeIntervall startTime = now + startDelayTime;

    [playerA playAtTime: startTime];
    [playerB playAtTime: startTime];
    [playerC playAtTime: startTime];
    [playerD playAtTime: startTime];
    [player...



With no startDelayTime the first 100-200ms of all players will get clipped off because the start command actually takes its time to the run loop although the players have already started (well, been scheduled) 100% in sync at now. But with a startDelayTime = 0.25 you are good to go. And never forget to prepareToPlay your players in advance so that at start time no additional buffering or setup has to be done - just starting them guys ;-)


I am having my iPhone 4s playing a lots of stereo tracks at the same time for hours in perfect sync ;-) I even sorted out a one frame glitch - traced back to a floating point rounding error! By rounding float seconds to integer frames before calculating any startingFrames and frameCounts you will stay 100% sample-frame accurate. (see end of the post)



ONE ADDITIONAL AUDIO PRO-TIP:

If you really wanna be sure that you have a perfect sync just use your favorite cd-ripped song.wav in 44.1kHz/16bit.

Make a copy of it and load it into an audio editor, inverse the phase and save it as is now.


When you now schedule both version at exactly the same startTime (and of course with the same volume/pan settings) you will hear - SILENCE...


Because of the phase inversion they cancel each other 100%.


If for some reason this perfect cancelation gets molested by any out-of-sync issue, you will hear clicks and other noises or in case of a frame loss (like dropping a single frame) you will suddenly hear your whole song but in a phlanging and phasing sound, sharp HIs and no LOWs etc. If you somewhere along your way win a frame back, your players could even go back to perfect sync again. But when dropping more frames (and so widening the gap) this phlanging will finally become a very short delay sound as your players drift apart.


If you are curious and just wanna hear how this sounds, force-delay one of two players by one single frame, using the same file on both players. Then add a second frame to your delay-variable and so on...


Later do the same with an in-phase and an inversed-phase version of the song scheduled on two player - just to hear the difference. Now you are an Audio-Pro and no framework, algorithm or programming language can do any harm to your music... ;-)



You can even do that:


Prepare your setup...


audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory: AVAudioSessionCategoryPlayback error: nil];
[audioSession setActive: YES error: nil];

NSString *soundFilePathA = [[NSBundle mainBundle] pathForResource: @"mySong-original"
                                                              ofType: @"wav"];
NSString *soundFilePathB = [[NSBundle mainBundle] pathForResource: @"mySong-phase-inversed"
                                                              ofType: @"wav"];
NSURL *fileURLforPlayerA = [[NSURL alloc] initFileURLWithPath: soundFilePathA];
NSURL *fileURLforPlayerB = [[NSURL alloc] initFileURLWithPath: soundFilePathB];
fileForPlayerA = [[AVAudioFile alloc] initForReading:fileURLforPlayerA error:nil];
fileForPlayerB = [[AVAudioFile alloc] initForReading:fileURLforPlayerB error:nil];

engine = [[AVAudioEngine alloc]init];
playerA = [[AVAudioPlayerNode alloc]init];
playerB = [[AVAudioPlayerNode alloc]init];
[engine attachNode:playerA];
[engine attachNode:playerB];

mainMixer = [engine mainMixerNode];
[engine connect:playerA to:mainMixer format:fileForPlayerA.processingFormat];
[engine connect:playerB to:mainMixer format:fileForPlayerB.processingFormat];
[engine startAndReturnError:nil];


Now split your song into regions and schedule them one after the other as segments of the original file onto playerA .

On playerB schedule the phase-inversed version as a whole piece!


// The starting points of the parts/regions of your song (in seconds)

(NSTimeInterval) anchorPart1 = 0;     // This is the guitar Intro of the Song (4 bars)
(NSTimeInterval) anchorPart2 = 13.13; // Now the whole band comes along (4 bars)
(NSTimeInterval) anchorPart3 = 24.95; // Heavy guitars join the team (4 bars)
(NSTimeInterval) anchorPart4 = 36.78; // Backing down to make space for the vocals in part 5 (4 bars)
(NSTimeInterval) anchorPart5 = 48.45; // Vers Vocals (8 bars)
(NSTimeInterval) anchorPart6 = 71.77; // Bridge Vocals (8 bars)

// Now do the casting from seconds to frames - so that the floating-point unit
// doesn't goof you up when calculating the frameCount below in seconds (NSTimeInterval)

AVAudioFramePosition positionPart1 = (anchorPart1 * fileForPlayerA.fileFormat.sampleRate);
AVAudioFramePosition positionPart2 = (anchorPart2 * fileForPlayerA.fileFormat.sampleRate);
AVAudioFramePosition positionPart3 = (anchorPart3 * fileForPlayerA.fileFormat.sampleRate);
AVAudioFramePosition positionPart4 = (anchorPart4 * fileForPlayerA.fileFormat.sampleRate);
AVAudioFramePosition positionPart5 = (anchorPart5 * fileForPlayerA.fileFormat.sampleRate);
AVAudioFramePosition positionPart6 = (anchorPart6 * fileForPlayerA.fileFormat.sampleRate);

// Now the scheduling of the parts/regions on playerA - all atTime:nil
// All positions for startingFrames for calculating frameCounts are integers now because of the casting above

[playerA scheduleSegment:fileForPlayerA startingFrame:positionPart1 frameCount:(UInt32)((positionPart2 - positionPart1) + 0) atTime:nil completionHandler:nil];
[playerA scheduleSegment:fileForPlayerA startingFrame:positionPart2 frameCount:(UInt32)((positionPart3 - positionPart2) + 0) atTime:nil completionHandler:nil];
[playerA scheduleSegment:fileForPlayerA startingFrame:positionPart3 frameCount:(UInt32)((positionPart4 - positionPart3) + 0) atTime:nil completionHandler:nil];
[playerA scheduleSegment:fileForPlayerA startingFrame:positionPart4 frameCount:(UInt32)((positionPart5 - positionPart4) + 0) atTime:nil completionHandler:nil];
[playerA scheduleSegment:fileForPlayerA startingFrame:positionPart5 frameCount:(UInt32)((positionPart6 - positionPart5) + 0) atTime:nil completionHandler:nil];
[playerA scheduleSegment:fileForPlayerA startingFrame:positionPart6 frameCount:(UInt32)(fileForPlayerA.length - positionPart6) atTime:nil completionHandler:nil];

// And the scheduling on playerB with the phase-inversed file in all its glory

[playerB scheduleFile:fileForPlayerB atTime:nil completionHandler:nil];



If the engine messes up only one single sample-frame on all these cues, the perfect SILENCE will be broken !!!


Now you can schedule another pair of in/out-of-phase player in parallel to see how your engine and device is doing with 4 stereo tracks, then two more... - until you hit the limit of your device ;-)



--------------------------------------------------------------------------



BTW: Like already mentioned above - If you don't do the casting on the 6 cue points above then you will notice that sometimes regions are droping a frame due to a floating point rounding error when calculating ---> frameCount:(UInt32)(positionPart3 - positionPart2) with NSTimeIntervals because the float value of both (the start and end) positions are somewhere between the actuall frame boundary (44100 frames/samples per second). From that moment on the two player are one sample off-sync and you'll will suddenly hear your audio. Why the frame drop?


Just imagine e.g.


variable1 = round(10.7);           // variable1 = 11.0;
variable2 = round(10.7);           // variable2 = 11.0;
variable3 = variable1 + variable2; // variable3 = 22.0;


When assigning (casting) this float variable3 to an integer variable (like frameCount above) the result is


frameCount = (UInt32)variable3;      // frameCount = 22;


without prior casting though


variable1 = (10.7);                // variable1 = 10.7;
variable2 = (10.7);                // variable2 = 10.7;
variable3 = variable1 + variable2; // variable3 = 21.4;


When you now assign (cast) variable3 to frameCount the result is


frameCount = (UInt32)variable3;      // frameCount = 21;


So you dropped one frame and the player is out-off-sync...



----

Sorry...it's been quiet from me on this one 😊


In the end I ended up dropping down to AudioUnit Land and created a AUGraph with AudioFilePlayer nodes and that did the trick for me, and is now used for our MacOS app and our iOS app.


Thank you once again for your time and sorry that I never answered you back.