I have had a play with this, and have tried inserting a second intermediate AVAudioMixerNode. My thoery that when I connect the two mixers, I can select the format conversion then, and can then connect the output of the intermediateMixer to AVAudioSinkNode.
When I try thisI get peculiar results:
let outputFormat = engine.outputNode.outputFormat(forBus: 0)
let inputFormat = engine.inputNode.outputFormat(forBus: 0)
let requiredFormat = AVAudioFormat(commonFormat: .pcmFormatFloat32,
sampleRate: Double(sampleRate),
channels: 1,
interleaved: false)
let formatMixer = AVAudioMixerNode()
engine.attach(formatMixer)
let intermediateMixer = AVAudioMixerNode()
engine.attach(intermediateMixer)
engine.connect(input, to: formatMixer, format: inputFormat)
engine.connect(formatMixer, to: intermediateMixer, format: requiredFormat)
engine.attach(MicSinkNode)
engine.connect(intermediateMixer, to: MicSinkNode, format: nil)
If I print to console the various formats along the way, I get:
Output Format is <AVAudioFormat 0x600002143930: 4 ch, 48000 Hz, Float32, non-inter>
InputNode Format is <AVAudioFormat 0x60000211ae90: 4 ch, 48000 Hz, Float32, non-inter>
Required Format for input is: Optional(<AVAudioFormat 0x60000213ed00: 1 ch, 48000 Hz, Float32>)
FormatMixer Format is <AVAudioFormat 0x6000021496d0: 1 ch, 48000 Hz, Float32>
Intermediate Format is <AVAudioFormat 0x600002149770: 1 ch, 48000 Hz, Float32>
MicSinkNode Format is <AVAudioFormat 0x600002149810: 2 ch, 44100 Hz, Float32, non-inter>
Where does MicSinkNode get this format from? If I compare my original signal to the result from MicSinkNode, the result is the same length with no glitches. Despite being a different sampleRate at. this stage.
What is interesting is that if I change the line to:
engine.connect(intermediateMixer, to: MicSinkNode, format: requiredFormat) so that all of the formats match and the MicSinkNode Format is the same as requiredFormat, comparing the WAVs, the output of MicSinkNode is shorter than the original signal (as if a 44.1k has been read directly into a 48k stream). And the regular clock slip glitches are apparent in the audio.
So how do I correctly get a input signal int a single 48k buffer that I can read into an array?