AVAudioEngine & render callback

Hello all,


I'm currently trying to integrate AVAudioEngine into my apps, but I'm stuck to what seems to be the most basic question!


Here is my problem :


I have a huge sound processing and generating program, with a mixed C/Objective-C core code. Until now I've been using AVAudioSession (initiated with a graph, a mixer unit, prefered and maximum number frames etc.), and I set a render callback with AUGraphSetNodeInputCallback, which links to my core code, sends audio input to it if needed, and fills up the AudioBufferList for the hardware output.


I now want to transfer all the graph management from AVAudioSession to AudioEngine, in order to simplify audio conversion, and implement new features with the new AVAudioNode system. But for this, I need to plug my render callback somewhere. If I understood well what was presented in the WWDC videos about CoreAudio and AVAudioEngine (WWDC 2014 sessions 501 & 502, WWDC 2015 sessions 507 and 508), I need to output my generated audio into an AVAudioPCMBuffer and read it with a AVAudioPlayerNode. My question is now :

Where and how do I have to plug my render callback function to fill the AVAudioPCMBuffer within my core code ???


I've been searching about this for hours on the web, and all I found is how to use AVAudioPlayerNode to play a file, or how to read an AVAudioPCMBuffer generated in advance, but I can't find how to fill an AVAudioPCMBuffer with a custom render callback, and read it with a AVAudioPlayerNode synchronously and in real-time inside an AVAudioEngine.


Any advice?


Thank you all


Thomas

Replies

Dear all,


Nobody?... 😟


If this can help to help me, here is my question in a simpler way :


How can I use a my own audio processing/generating process within a AVAudioEngine?


All the examples I found online explained how to use AVAudioPlayer to play a file or a pre-computed buffer, or how to use built-ins AVAudioUnitEffect (Delay, Distortion, EQ, Reverb) to apply FX on mic input (and then output it though headphones or store it in file). But I can't find how to integrate my own process with its own render callback in this architecture...


Pease help !!!


Thanks


Thomas

Not sure if this is what you're asking but you can add a render callback on the last node that's pulled in the AVAudioEngine chain. The "correctness" of doing this is unclear but it does work. Just set the kAudioUnitProperty_SetRenderCallback property on component from the node you want to pass your audio into. That way you can generate your audio into the buffers in the callback. The documentation is all far too sparse but from my experience with it I think AVAudioEngine is really just a higher level / simpler method to build AUGraphs, underneath it still all runs at the component level. In AVAudioEngine most but not all nodes have an audioUnit property, it depends on what the node is. The mixer node for example doesn't but you (I) generally don't need a mixer and I think it's not even instantiated until you actually try and use it. ( It's created lazilly when you access the mainMixerNode property ).


Hope this helps. Anyone feel free to correct me because I'm still getting my head around all the latest changes too.

Hello,


Thanks for this trick. As you say, the "correctness" of this seems rather unclear.


Moreover, since I asked this question, I finally understood something important : for audio processing apps on iOS, AVAudioEngine in its iOS 8.0 version seems rather limited. However, it makes a lot more sense with the AudioUnit v3 introduced with iOS 9.0, with which you can use custom AudioUnits on iOS, and therefore make huge practical advantage of AVAudioEngine architecture even if you have a lot of custom audio processing code.


As I always keep my apps retro-compatible for one iOS version, I'll explore that next year 😉

Hi all,


Two years later, I still have the same problem.


I'vre recently heard that AUGraph will be deprecated in 2018, so I'm looking for a way to use my own code to generate/process manually audio data within the new AVAudioEngine framework.


Does anybody have a clue how we are supposed to do that?


Thank you all for your help


😎 Tom 😎

I'm not 100% sure what your goal is but you should be able to create AVAudioUnits which will be backed by AUAudioUnits. And in AUAudioUnit you can have your own render blocks. Additionally, as of iOS 11 you have manual rendering options in AVAudioEngine (https://developer.apple.com/videos/play/wwdc2017/501/)

I also heard the bit in a WWDC session about AUGraph being deprecated. I wanted a future-proof solution using AVAudioEngine instead. So I wrote a test app that instantiates an AUAudioUnit subclass with a callback block. The unit is then connected to AVAudioEngine to play the generated sound samples. Seems to work under iOS (device and Simulator), and with fairly low latency (small callback buffers). The source code for my test app is posted on github (search for hotpaw2). There's a hotpaw2 github gist on recording audio as well. Let me know if any of that helps, or if there is a better way to do this.

Wow, I missed that, thank you. Block-based real-time audio rendering looks very promising !

AVAudioEngine seems to finally approach the "fully-featured" state, now I understand why they want to deprecate AUGraph and AudioToolbox in favor of AVFoundation and AVAudioEngine.

Thank you very much for that!

I'm not used to AVAudioEngine and the Tap in your code puzzles me at first look. I'm gonna check that thoroughly and let you know if I have any question or comment.

The auv3test5 test app was written to test play-and-record using AVAudioEngine on an iOS device. The tap-on-bus was to make sure that buffers of samples from the microphone could be received during the audio generation test; it's not needed for a play-only audio app.

Hello,


I just spent the last month playing around with AVAudioEngine and custom AVAudioUnits. Thank to the auv3test5 test app (thank you so much hotpaw2), I managed to finally easily embed custom rendering code inside an AVAudioEngine, for both audio generation and audio processing.


I now want to integrate a custom audio analysis module in my engine. The immediate solution seems to be the tap-on-bus, but I face two problems with it:


PROBLEM 1

I dont understand how to control the buffer size of the tap block. There is a bufferSize input variable on the installTapOnBus function, but, as mentionned in the documentation "The implementation may choose another size.", and it does in my case (so what's the point of the bufferSize input???). This results in some implementation problems (I don't even know what maximum buffer size to expect). Also, as I use the results of my analysis for graphic visualisations, I can't control the refresh rate of my visualisations, and generally end up with 10 refresh / sec (4800 frames buffer with 48000 sample rate, even though I asked for a 1024 frames buffer), which is not satisfying.


PROBLEM 2

I want to make a custom mix of some nodes' output for the analysis module.

- I tried to use a mixer node to do that, but for the tap block to be called, the mixer node's output bus on which the tap is installed must be plugged in some way to the main output node. So the only solution I found so far is to plug my "analysis mixer node" to the main mixer node and setting its volume to 0, but it's not very satisfaying, is it? Also, the documentation of installTapOnBus seems to mention that the tap can be installed on an output bus which is not connected to anything except the tap : "This should only be done when attaching to an output bus which is not connected to another node", but in this case, my tap block is never called.

- I tried to explore the idea of building some kind of custom AVAudioOutputNode (which has input buses, but no output buses), but I didn't find anything about it, and really don't see how to build this one, especially with the problem of sceduling input calls, which is not a problem when your custom AVAudioNode is connected in some way to the main output node who takes care of scheduling rendering calls.


The only solution I found for both these problem is to build a custom processing node connected to the main mixer but allways outputing silence. This way, I can:

- choose a maximumFramesToRender (which at least sets a maximum buffer size for my analysis module),

- use a dedicated mixer node,

- be scheduled by the main output node.

But once again, this seems weird, and I have a useless input bus on my main mixer...


Any idea anyone ?


Thanks to all 🙂

Regarding Problem 1:


For real-time audio processing, instead of trying to control the buffer size of the tap block, one can instantiate a custom AUAudioUnit effect unit, and connect that unit between the audio engine inputNode and the main mixer. This allows configuring and processing shorter (lower latency) audio sample buffers from the microphone than using an installTap().


I set up my effect unit to pass the shorter input sample buffers from the microphone to my 60 Hz visualization routines (via a lock free circular buffer); and also have the effect unit output silence to the mixer. The mixer is connected to do the pulling of the (hidden?) audio graph.

Hello hotpaw2,


Thank you for your answer 🙂


I came with basically the same conclusion for the moment : instantiate a custom processing unit with either

- a pass-through (in adition to the analysis work) somewhere inside the graph,

- or a silent output (in adition to the analysis work) connected to the main mixer, in the case where the audio signal to be analysed is not to be sent to the main output (problem 2: I have a dedicated mixer for the analysis signal).


It works, but that's not very satisfying as :

- I still don't understand the point of the bufferSize input of the installTapOnBus function as it doesn't seem to have any impact...

- In my case, I make the main mixer process a useless input (even though I hope the kAudioUnitRenderAction_OutputIsSilence flag raised for the silent output of my analysis unit is taken into account).


And yes, it seems pretty obvious to me that AVAudioEngine is built on top of AUGraph, and that AVAudioEngine connections rely on the render callbacks mechanics. I think I have read something about this a while back but I can't find it again. I'll let you know if I find it.

TapOnBus seems to be designed to be easier to set up and use by less experienced programmers. A tap seems designed to allow standard Cocoa/Swift/Objective C programming practices, such as using Swift data types, calling methods, synchronizing accesses, and allocating memory. All this requires a more buffering to allow time to do this safely. Thus, a tap seems to ignore trying to set a buffer size too small to safely allow for all of the above programmer friendly practices.


Whereas, the callback functions and blocks for Audio Units are called in a real-time context. Safe use of the audio context requires special real-time coding practices (such as deterministic code with no Swift data types, no Objective C methods, no memory allocation or release, no semaphores or locks, etc. etc.). Thus, a good programmer can get away with asking for 5 millisecond to even sub-millisecond buffer sizes.


As for why use mixer inputs (or RemoteIO)? They appear to be a good source for a high priority low jitter periodic timer calls, which are needed for real-time low-latency audio IO. You could try experimenting with GCD or mach timers, and see how they compare in jitter and latency to using Audio Unit callbacks to pull audio.


AFAIK. YMMV.

I tried the trick with a dedicated mixer for the analysis module (see Problem 2 above), but I have many problems with the one-to-many connection function (connect:toConnectionPoints:fromBus:format:).


Basically, my Engine looks like this: https://drive.google.com/file/d/1JfT9JySd5paFSUGRwQqv2HAB58Lt6lvU/view?usp=sharing,

- Sound generation chain is in blue, sound analysis chain is in green.

- All nodes except the mixers are custom AudioUnits.

- BasicGenerators are dynamically instantiated and destroyed while the app is running.

- Output connections from SoundGenerator1 and SoundGenerator2 are set using the one-to-many connection function.

- Output from Analyser is mute, the output connection is only here to use the calls from the main mixer in my analyser.

- Switches are simply connection / deconnecion on demand.

- Switch 1 is used to send or not SoundGenerator1 into analyser.

- Switch 2 is used to send or not SoundGenerator2 into anlayser.

- SwitchAnalyser is used to activate/deactivate the analyser (by plugging/unplugging to the main mixer calls).


All looks good on paper, but :

- It seems to work at the beginning, but as soon as I start to play with th switches, I allways end up with the audio problems (multiple connection doesn't workd anymore, with the audio going aonly one way, or event errors in the audio units).

- I tried to stop and reset the engine before changing a connection and then restart it again. I don't have any error anymore, but the multile connections still stop working after a few changes.


I don't know if I do something wrong or if I there are problems with the one-to-many connections introduced in iOS 9.

I'll try to make a simple sample to illustrate the problem as soon as I can.


Any idea ayone?

Your basic problem might be that you are trying to do your analysis in audio real-time. I wouldn't do that. I use the audio graph only for live output (to the speaker/headset), and input (from the microphone). Pulling analysis out of the audio callbacks helps reduces any computing in the audio context to the minimum required.


If needed for analysis, I save any microphone input and generator output samples somewhere else (usually lock-free circular buffers/fifos). Then do the analysis slightly later in the UI thread, since a device can only display the analysis output at frame rate (30, 60 or 120 Hz), not at audio rate (which can be sub-millisecond on newer iOS devices).


Therefore, your analysis mixer, which might be causing a lot of your problems, is completely unneeded. You could use the audio graph mixer only for live output and microphone input (but stash the microphone and each generator's sample data output in some lock-free side channels). Then analyze the latest sample buffers from all the side-channels later (during an NSTImer/CADisplayLink/GPU task, or whatever).