Why is AVAudioEngine unable to convert the format from my bluetooth headphones Input?

I have been using AVAudioEngine to take audio from the mic and send it out over a WebRTC connection. When I use the iPhone device mic, this seems to work as expected. But if I run the app with bluetooth headphones connected, the engine reports this error when trying to start:

Code Block
[avae]  AVAudioEngine.mm:160   Engine@0x2833e1790: could not initialize, error = -10868
[avae]  AVAEInternal.h:109   [AVAudioEngineGraph.mm:1397:Initialize: (err = AUGraphParser::InitializeActiveNodesInInputChain(ThisGraph, *GetInputNode())): error -10868
Error starting audio engine: The operation couldn’t be completed. (com.apple.coreaudio.avfaudio error -10868.)

I see that Error code -10878 is:
Code Block
@constant kAudioUnitErr_FormatNotSupported
Returned if an input or output format is not supported
...
kAudioUnitErr_FormatNotSupported = -10868

but that doesn't seem like it can be quite correct. I know that the output format is supported because the same format works correctly when my headphones are not attached. And I am pretty sure that the input format is supported because I am able to simply hook up Headphones InputNode -> Mixer -> Headphones OutputNode and correctly hear the audio from the mic.

So I can only assume that this means the format conversion is not supported.

My Questions:

  1. Is this a bug?

  2. Is there any way I can work around this?

Notes:

  • My full audio graph looks like this, where all the "mixers" are just AVAudioMixerNodes:

Code Block
// InputNode (Mic)  -> Mic Mixer -\
// >-> WebRTC Mixer -> Tap -> WebRTC Framework
// AudioPlayer 1 -> Player Mixer  -/
//
// AudioPlayer 2 -> Player Mixer -----> LocalOutputMixer -> OutputNode (Device Speakers/Headphones)

but the issue still happens even if I simplify down to this:
Code Block
InputNode (Mic)  -> Mixer -> Tap -> WebRTC Framework

Specifically it happens when a single mixer node is connected with an input format and output format as follows:
  • The input format is:

Code Block
(lldb) po audioEngine.inputNode.inputFormat(forBus: 0).streamDescription.pointee
▿ AudioStreamBasicDescription
- mSampleRate : 16000.0
- mFormatID : 1819304813
- mFormatFlags : 41
- mBytesPerPacket : 4
- mFramesPerPacket : 1
- mBytesPerFrame : 4
- mChannelsPerFrame : 1
- mBitsPerChannel : 32
- mReserved : 0
  • The output format WebRTC expects is:

Code Block
▿ AudioStreamBasicDescription
- mSampleRate : 48000.0
- mFormatID : 1819304813
- mFormatFlags : 12
- mBytesPerPacket : 2
- mFramesPerPacket : 1
- mBytesPerFrame : 2
- mChannelsPerFrame : 1
- mBitsPerChannel : 16
- mReserved : 0
  • My headphones are Jaybird Freedom 2.

Replies

If you look at the mFormatFlags, you'll see that the input format is float (the 0x01 bit of the flags), while the output format is signed integer (the 0x02 bit of the flags). A mixer node can't make that conversion automatically, so you'll have to do it yourself.

Note that the input format is constrained when you use a Bluetooth headset (i.e. headphones + mic), because it uses the HFP protocol.
Thanks @Polyphonic! I'll give that a shot.

A mixer node can't make that conversion automatically, so you'll have to do it yourself.

I've been back at this, and trying to do this myself. But I can't figure out how.

  • Is there a way to insert a custom "conversion node" into the AVAudioEngine chain to handle this between the input node and the mixer?

I have also tried putting a tap on the mixer's output, and thought I might be able to do some pointer math with the end result if the mixer has "failed" to convert correctly but the format on that tap's buffer reports that it has been able to convert to Int data, and the floatChannelData is nil. So I'm not sure how to detect that the conversion needs to be done, or where to get the float data that needs converting.

I also wondered if I need to set up an extra AudioEngine, obtain the unconverted mic data from it's output, convert it to Int16, and then play it into a playerNode on the main engine? But that seems like it would be expensive, and sort of an awkward approach.

Can you help get me started on how that conversion should happen?