AVAudioEngine: How to convert sample rate when writing to a file?

I've looked all over the place and have not yet found a working example that can use AVAudioEngine and do a sample-rate conversion when writing to an output file. Here are the things that I've tried:


1) Change the output format on the inputNode. Although the docs say that this is possible (ref: https://developer.apple.com/documentation/avfoundation/avaudioinputnode, text: "The format of the output scope is initially the same as that of the input, but you may set it to a different format, in which case the node will convert."), this actually results in a crash: "'required condition is false: format.sampleRate == hwFormat.sampleRate'"


Also, according to theanalogkid here: https://forums.developer.apple.com/message/201324#201324, this isn't possible anyways: "We currently don't provide sample rate conversion on an input node - if one destination is a file, use AVAudioFile and let it perform the conversion and move your tap to the input node.", so possibly, the docs are out of date or refer to another situation that I have seen, where inserting a microphone will cause the input node's input format to change to 44100 Hz, but leave the output format as 48000 Hz.


I'm not sure how AVAudioFile can be used to perform the conversion, since it wants samples in the 'processingFormat' which might not match the input node's format especially due to these microphone changes.


2) Changing the format of the input node's tap results in the same error.


3) I've tried adding a tap on the mainMixerNode instead with the sample rate that I want, but then the audio engine breaks and my tap callback never gets called.


The only solution that I have found so far is to use AVAudioConverter, which is an API that is undocumented in Apple's online docs. This solution, used inside a tap callback block, feels ham-fisted:


writerSerialQueue.async {
                do {
                     let outBuffer = AVAudioPCMBuffer(pcmFormat: processingFormat, frameCapacity: pcmBuffer.frameCapacity)!
                     var error : NSError?
                    
                     var numCalls = 0
                     converter.convert(to: outBuffer, error: &error, withInputFrom: { (inNumPackets, outStatus) -> AVAudioBuffer? in
                        if numCalls == 0 {
                            outStatus.pointee = AVAudioConverterInputStatus.haveData
                            numCalls += 1
                            return pcmBuffer
                        } else {
                            outStatus.pointee = AVAudioConverterInputStatus.noDataNow
                            return nil
                        }
                     })
                    try audioFile.write(buffer: pcmBuffer)
                } catch {
                    /
                }
            }


Please tell me it's not supposed to be this difficult and I've missed an easy or obvious solution?

Replies

So the solution I settled on for now uses an undocumented feature of AVAudioEngine:


1) The reason I needed to use this is because the input node format changes when a microphone is inserted and removed. If this is not taken into account when writing to an output file, then the file will contain audio data at the incorrect pitch and speed.


2) Calling .installTap must either reflect the input node's ***input format*** or be nil.


3) However, if .installTap is called with the input node's input format instead of a nil format, then when a microphone is inserted or removed, the input format will change to reflect that change (and AVAudioEngineConfigurationChange will be posted); however, the ***output*** format will remain the same as the previous format! This is an undocumented way of getting the input node to perform a sample rate conversion, and so far I've only observed this to happen when the HW sample rate changes as a result of a microphone being inserted / removed.


Therefore, I can record to an audio file and not have to worry about performing my own sample rate conversions, even when the sample rate changes due to microphone state changes, as long as the following is respected:


1) When creating the AVAudioFile for output, the format should match the input node's ***output format***, not the input format. This is the format that the audio tap is going to be sending to you.


2) The tap should only be created once, after acquiring the session and input for the first time. This is because the tap format must always match the input node's input format, so creating the tap after the HW sample rate has changed means that we'll lose the 'free' sample rate conversion that the input node provides us.


3) To remove the sample rate conversion (for example, upon starting a new recording), the tap can be re-created or the AVAudioEngine can simply be reinitialized.


So this behavior helps account for sample rate changes due to the mic insertions/removals without requiring the use of AVAudioConverter. Unfortunately, the input node will only allow a sample rate conversion when it changes the sample rate itself, so conversions to any other sample rates other than the initial HW sample rate will still require the use of AVAudioConverter.

Just a question, why do you use pcmBuffer instead of outBuffer to write the file? I’m facing a similar approach where I want to downsample mic input and store into a file using AudioEngine and lack of documentation is really disappointing. Thanks in advance!