Using MTAudioProcessingTap with AVPlayer requires ring buffer and format conversion?

What I'm looking to do is load a movie file and have it playback in real-time (synchronized with a particular clock) and have video and audio samples fed to me in the format I want (a lot like the AVCapture APIs do). When using an AVPlayer + AVPlayerItem, I can create a AVPlayerItemVideoOutput to get the video frames. Great. For audio though, it requires using a MTAudioProcessingTap added into the player item's audioMix.


What's really odd about MTAudioProcessingTap on a player item is that I'm apparently at the mercy of whatever audio format AVFoundation wants to give me. There's seemingly no guarantee on what the format will look like. Compressed? LPCM? Floating point? Integer? Interleaved? Sample rate? I'm betting it's always at least floating LPCM (a canonical/Standard format), but what about sample rate? That I have no control over, and in my situation I need/want the sample rate to be a specific rate. (As well as wanting mixed down to stereo [or split up from mono].)


Having no choice over the format is really inconvenient, because it seems that I have to convert the audio coming out of the tap. The real unfortunate part of this is that when sample rate conversion is involved, it seems there's a need to have an intermediate (small) ring buffer between the tap and the audio conversion because of the potentially fractional ratio of input:output frames in the conversion, and needing to keep the unused input frames from one tap "process" callback around until the next tap "process" callback where they would be used.


Anybody following me? Am I wrong? Is there no simpler way to simply get the AVPlayerItem's audio fed to me in real-time in my specified format? I find it hard to imagine I'm the first going down this path, but so far I can't find any info from anyone's prior experience.

Did you ever end up figuring this out? I'm stuck with the same problem, and using MTAudioProcessingTap is the best solution I've found..
Using MTAudioProcessingTap with AVPlayer requires ring buffer and format conversion?
 
 
Q