If this can help to help me, here is my question in a simpler way :
How can I use a my own audio processing/generating process within a AVAudioEngine?
All the examples I found online explained how to use AVAudioPlayer to play a file or a pre-computed buffer, or how to use built-ins AVAudioUnitEffect (Delay, Distortion, EQ, Reverb) to apply FX on mic input (and then output it though headphones or store it in file). But I can't find how to integrate my own process with its own render callback in this architecture...
Pease help !!!
Not sure if this is what you're asking but you can add a render callback on the last node that's pulled in the AVAudioEngine chain. The "correctness" of doing this is unclear but it does work. Just set the kAudioUnitProperty_SetRenderCallback property on component from the node you want to pass your audio into. That way you can generate your audio into the buffers in the callback. The documentation is all far too sparse but from my experience with it I think AVAudioEngine is really just a higher level / simpler method to build AUGraphs, underneath it still all runs at the component level. In AVAudioEngine most but not all nodes have an audioUnit property, it depends on what the node is. The mixer node for example doesn't but you (I) generally don't need a mixer and I think it's not even instantiated until you actually try and use it. ( It's created lazilly when you access the mainMixerNode property ).
Hope this helps. Anyone feel free to correct me because I'm still getting my head around all the latest changes too.
Thanks for this trick. As you say, the "correctness" of this seems rather unclear.
Moreover, since I asked this question, I finally understood something important : for audio processing apps on iOS, AVAudioEngine in its iOS 8.0 version seems rather limited. However, it makes a lot more sense with the AudioUnit v3 introduced with iOS 9.0, with which you can use custom AudioUnits on iOS, and therefore make huge practical advantage of AVAudioEngine architecture even if you have a lot of custom audio processing code.
As I always keep my apps retro-compatible for one iOS version, I'll explore that next year
Two years later, I still have the same problem.
I'vre recently heard that AUGraph will be deprecated in 2018, so I'm looking for a way to use my own code to generate/process manually audio data within the new AVAudioEngine framework.
Does anybody have a clue how we are supposed to do that?
Thank you all for your help
I'm not 100% sure what your goal is but you should be able to create AVAudioUnits which will be backed by AUAudioUnits. And in AUAudioUnit you can have your own render blocks. Additionally, as of iOS 11 you have manual rendering options in AVAudioEngine (https://developer.apple.com/videos/play/wwdc2017/501/)
I also heard the bit in a WWDC session about AUGraph being deprecated. I wanted a future-proof solution using AVAudioEngine instead. So I wrote a test app that instantiates an AUAudioUnit subclass with a callback block. The unit is then connected to AVAudioEngine to play the generated sound samples. Seems to work under iOS (device and Simulator), and with fairly low latency (small callback buffers). The source code for my test app is posted on github (search for hotpaw2). There's a hotpaw2 github gist on recording audio as well. Let me know if any of that helps, or if there is a better way to do this.
I just spent the last month playing around with AVAudioEngine and custom AVAudioUnits. Thank to the auv3test5 test app (thank you so much hotpaw2), I managed to finally easily embed custom rendering code inside an AVAudioEngine, for both audio generation and audio processing.
I now want to integrate a custom audio analysis module in my engine. The immediate solution seems to be the tap-on-bus, but I face two problems with it:
I dont understand how to control the buffer size of the tap block. There is a bufferSize input variable on the installTapOnBus function, but, as mentionned in the documentation "The implementation may choose another size.", and it does in my case (so what's the point of the bufferSize input???). This results in some implementation problems (I don't even know what maximum buffer size to expect). Also, as I use the results of my analysis for graphic visualisations, I can't control the refresh rate of my visualisations, and generally end up with 10 refresh / sec (4800 frames buffer with 48000 sample rate, even though I asked for a 1024 frames buffer), which is not satisfying.
I want to make a custom mix of some nodes' output for the analysis module.
- I tried to use a mixer node to do that, but for the tap block to be called, the mixer node's output bus on which the tap is installed must be plugged in some way to the main output node. So the only solution I found so far is to plug my "analysis mixer node" to the main mixer node and setting its volume to 0, but it's not very satisfaying, is it? Also, the documentation of installTapOnBus seems to mention that the tap can be installed on an output bus which is not connected to anything except the tap : "This should only be done when attaching to an output bus which is not connected to another node", but in this case, my tap block is never called.
- I tried to explore the idea of building some kind of custom AVAudioOutputNode (which has input buses, but no output buses), but I didn't find anything about it, and really don't see how to build this one, especially with the problem of sceduling input calls, which is not a problem when your custom AVAudioNode is connected in some way to the main output node who takes care of scheduling rendering calls.
The only solution I found for both these problem is to build a custom processing node connected to the main mixer but allways outputing silence. This way, I can:
- choose a maximumFramesToRender (which at least sets a maximum buffer size for my analysis module),
- use a dedicated mixer node,
- be scheduled by the main output node.
But once again, this seems weird, and I have a useless input bus on my main mixer...
Any idea anyone ?
Thanks to all
Regarding Problem 1:
For real-time audio processing, instead of trying to control the buffer size of the tap block, one can instantiate a custom AUAudioUnit effect unit, and connect that unit between the audio engine inputNode and the main mixer. This allows configuring and processing shorter (lower latency) audio sample buffers from the microphone than using an installTap().
I set up my effect unit to pass the shorter input sample buffers from the microphone to my 60 Hz visualization routines (via a lock free circular buffer); and also have the effect unit output silence to the mixer. The mixer is connected to do the pulling of the (hidden?) audio graph.
Thank you for your answer
I came with basically the same conclusion for the moment : instantiate a custom processing unit with either
- a pass-through (in adition to the analysis work) somewhere inside the graph,
- or a silent output (in adition to the analysis work) connected to the main mixer, in the case where the audio signal to be analysed is not to be sent to the main output (problem 2: I have a dedicated mixer for the analysis signal).
It works, but that's not very satisfying as :
- I still don't understand the point of the bufferSize input of the installTapOnBus function as it doesn't seem to have any impact...
- In my case, I make the main mixer process a useless input (even though I hope the kAudioUnitRenderAction_OutputIsSilence flag raised for the silent output of my analysis unit is taken into account).
And yes, it seems pretty obvious to me that AVAudioEngine is built on top of AUGraph, and that AVAudioEngine connections rely on the render callbacks mechanics. I think I have read something about this a while back but I can't find it again. I'll let you know if I find it.
TapOnBus seems to be designed to be easier to set up and use by less experienced programmers. A tap seems designed to allow standard Cocoa/Swift/Objective C programming practices, such as using Swift data types, calling methods, synchronizing accesses, and allocating memory. All this requires a more buffering to allow time to do this safely. Thus, a tap seems to ignore trying to set a buffer size too small to safely allow for all of the above programmer friendly practices.
Whereas, the callback functions and blocks for Audio Units are called in a real-time context. Safe use of the audio context requires special real-time coding practices (such as deterministic code with no Swift data types, no Objective C methods, no memory allocation or release, no semaphores or locks, etc. etc.). Thus, a good programmer can get away with asking for 5 millisecond to even sub-millisecond buffer sizes.
As for why use mixer inputs (or RemoteIO)? They appear to be a good source for a high priority low jitter periodic timer calls, which are needed for real-time low-latency audio IO. You could try experimenting with GCD or mach timers, and see how they compare in jitter and latency to using Audio Unit callbacks to pull audio.
I tried the trick with a dedicated mixer for the analysis module (see Problem 2 above), but I have many problems with the one-to-many connection function (connect:toConnectionPoints:fromBus:format:).
Basically, my Engine looks like this: https://drive.google.com/file/d/1JfT9JySd5paFSUGRwQqv2HAB58Lt6lvU/view?usp=sharing,
- Sound generation chain is in blue, sound analysis chain is in green.
- All nodes except the mixers are custom AudioUnits.
- BasicGenerators are dynamically instantiated and destroyed while the app is running.
- Output connections from SoundGenerator1 and SoundGenerator2 are set using the one-to-many connection function.
- Output from Analyser is mute, the output connection is only here to use the calls from the main mixer in my analyser.
- Switches are simply connection / deconnecion on demand.
- Switch 1 is used to send or not SoundGenerator1 into analyser.
- Switch 2 is used to send or not SoundGenerator2 into anlayser.
- SwitchAnalyser is used to activate/deactivate the analyser (by plugging/unplugging to the main mixer calls).
All looks good on paper, but :
- It seems to work at the beginning, but as soon as I start to play with th switches, I allways end up with the audio problems (multiple connection doesn't workd anymore, with the audio going aonly one way, or event errors in the audio units).
- I tried to stop and reset the engine before changing a connection and then restart it again. I don't have any error anymore, but the multile connections still stop working after a few changes.
I don't know if I do something wrong or if I there are problems with the one-to-many connections introduced in iOS 9.
I'll try to make a simple sample to illustrate the problem as soon as I can.
Any idea ayone?
Your basic problem might be that you are trying to do your analysis in audio real-time. I wouldn't do that. I use the audio graph only for live output (to the speaker/headset), and input (from the microphone). Pulling analysis out of the audio callbacks helps reduces any computing in the audio context to the minimum required.
If needed for analysis, I save any microphone input and generator output samples somewhere else (usually lock-free circular buffers/fifos). Then do the analysis slightly later in the UI thread, since a device can only display the analysis output at frame rate (30, 60 or 120 Hz), not at audio rate (which can be sub-millisecond on newer iOS devices).
Therefore, your analysis mixer, which might be causing a lot of your problems, is completely unneeded. You could use the audio graph mixer only for live output and microphone input (but stash the microphone and each generator's sample data output in some lock-free side channels). Then analyze the latest sample buffers from all the side-channels later (during an NSTImer/CADisplayLink/GPU task, or whatever).
I completely see your point about running the analysis job outside the audio thread, even though :
- I never had any trouble related to time spent on analysis inside the audio thread (I allways take a close look on computation cost of audio generation/processing/analysis algorithms before integrating one inside the audio render loop).
- It is most common in signal processing to need a precise sceduling of analysis over the input data, and so it does seem weird to me to schedule the analysis job with UI calls (let's take the example of a classic f0-detection algorithm working on 2048-frames input buffers with a 1024-frames overlap). Even more when the result of the analysis is not used for UI but for storage or real-time audio feedback. But I must admit that I have much more background in theoretical signal processing than in programming, so I'm far from an expert in threading
Moreover, after spending some time testing and playing around with the one-to-many connection, I realised that this problem is the same in a pure audio processing context, where it becomes much more serious. Take for example my previous graph and replace "analysis" with "side FX". Well, in that case also, playing with the switches (connecting/disconnecting the inputs of the "side FX mixer" or the output of the "side FX process") will inevitably make your engine bug after a few occurences, even if you stop and restart the engine at each connection change (which make audio pops by the way).
And in the case of an advanced modular audio app, this feature doesn't seem far fetched but really usefull to me !
I'm currently building a minimal project to illustrate that problem and send a bug report. I'll try to make it available to you if I can manage to use none of the proprietary code from my company.
Thanks again for all your help, it is much appreciated
It seems that we're not so many out there using these advanced features of AVFoundation and AVAudioEngine, and I must say that the documention, as well as the WWDC videos about these subjects, are a bit frustrating when you try to understand how to implement these new promising features
If you can create a reproduceable example, please send a bug report to Apple. That's how they prioritize fixes. There should be a link at the bottom of the forum web page.
BTW, my suggestion of audio analysis outside the graph was for visualization purposes. For live (re)synthesis, analysis probably should go in the audio ouput graph for lowest latency. Another suggestion: when making a change to the graph, you might want to consider doing a quick fade to/from silence before/after the change.