CoreAudio clock drifting from main thread? Callback consuming audio samples faster than main thread outputs samples

I'm trying to use the CoreAudio api to setup sound playback. I'm not using any of the fancy stuff like AudioUnits or AudioQueues. I just get a handle to the audio device id, and setup a callback to write audio.


My understanding is that this callback gets called periodically whenever it needs more audio by the OS on a separate thread. My main thread writes the audio samples to a non-locking ring buffer (using OSAtomic.h for atomic operations), which the callback then reads from to write it's samples.


The problem I am running into is that when my program starts there might be, let's say 2000 samples in the ring buffer, but over time it slowly (~30 seconds) dwindles to 0 and I get audio skips because there is nothing for the callback thread to read. It seems like the callback thread is reading samples faster then the main thread is writing them. Is that likely the case? It seems unlikely to me. I would think that the number of samples in the ring buffer might oscillate over time, but not steadily decrease.


To clarify, I am targeting OSX and writing in Objective C.


I'm at a loss as to what to do. I would really appreciate any help, I'm fairly new to this type of programming but have been enjoying it. I'm not sure if there is something obvious that maybe I just am not considering. If anyone has any ideas on things for me to try I'm all ears.

Replies

To be clear, when I refer to main thread, I am talking about my application loop, which is being called by the CVDisplayLink callback, which for me is called 60 times a second. I have set an assertion to stop the program if I miss a frame, so I can confirm it's not missed frames causing this issue.


However, maybe it is possible that the CVDisplaylink callback isn't being called at exactly 60 times a second (maybe a bit less), and that this is because the clocks used for this and sound are slightly off?

I just checked, and according to this:

https://developer.apple.com/documentation/corevideo/1457155-cvdisplaylinkgetactualoutputvide?language=objc


It says my actual refresh rate is 0.01668, which when multiplied by my audio sample rate (44.1khz) is about 735.58 samples per frame, while right now I am outputting 735 samples per frame since I thought it was 60 times per second (44.1khz / 60). This would mean I need to output an additional sample every 2 frames. Does this make sense? It doesn't seem like a great solution.

Three things:


1. You need to be sure that your “main thread” code can produce (create or read) samples fast enough. You didn’t say how you’re doing this, but I assume it can. I mention this only for completeness, in case you haven’t checked.


2. You need to be sure that your ”main thread” can enqueue samples fast enough. This means you need to understand the performance impact of your thread safety technique. You said you’re not using locks (which can be slow, if you use the wrong kind), but even you atomic accesses could lead to contention. You also need to be sure that display link callbacks are being handled in a timely manner. (If youre using a display link, that’s not really “main thread”.)


3. I think you’re making a big mistake if you’re trying to enqueue samples at the same rate as they’re being dequeued. It’s just asking for trouble. Instead, at each display link callback, you should enqueue as many samples as you can, up to the limit of what the queue can hold, and up to the limit of the callback time interval (or some reasonable approximation to that). It’s actually preferable that many “ticks” of your display link don’t have anything to do. That’s the point of a buffer, and if it’s not actually buffering, you’re Doing It Wrong(tm).

Hi Quincey, thanks for the response! I'll try to reply to your three points.


1. For now I am just outputing a constant sine wave.


2. You're right, it's not really the main thread that was a poor explanation. I don't think that my thread safety technique is so slow that it's causing an issue. I'm only using one shared variable to keep track of how many items are in the ring buffer, and use the OSAtomic.h atomic adds. I'll try to see if there is an exisiting fast solution for this to verify whether this is the issue. As far as display link callbacks being handled in a timely manner, I believe they are as I tried running only my audio enqueuing code in the callback and I timed it takes less than 1ms.


3. I'm trying to use this for something interactive like a game, and want to push out each frame of video with that frame's sound. So for each "tick" of my displaylink callback, I'd like to output the next frame's sound. Is that what you mean by the "limit of the callback time interval"? So for me about ~735 samples if outputting 44.1khz. I'm not sure I fully understand. Right now I'm not yet trying to sync the audio to the video, so I'm outputting many frames ahead, so I shouldn't run into the issue of the consuming thread goes to get a sample but the producing thread hasn't yet produced it. I believe you are correct in that I'm trying to enqueue samples at the same rate as the consuming audio thread dequeues them, but I don't see any way around this for something interactive like a game. Do you have any advice regarding this?


Big thanks again for the response, I really appreciate your helping me.

The display link callback rate is the average rate at which you need to push out audio samples, but you’re not working a truly real time system, so you typically need to prepare more samples in advance and queue/buffer them.


What APIs are you using to play the sound?

I believe I am using the Audio HAL APIs:

https://developer.apple.com/documentation/coreaudio?language=objc


And here is sample code that showed me how to use the API:

https://developer.apple.com/library/content/samplecode/HALExamples/Introduction/Intro.html

You wrote: "It says my actual refresh rate is 0.01668, which when multiplied by my audio sample rate (44.1khz) is about 735.58 samples per frame, while right now I am outputting 735 samples per frame ... "


So add a periodic monitor of the buffer level. If every 100 frames, you find your average buffer level is lowering around 58 samples or so, then you have your cause and a solution.


Your code needs to monitor the buffer level, and maintain a floating point accumulation of the number of samples needed to keep the buffer at the level required to meet your latency and error concealment requirements. Since you have a floating point accumulator, but can only fill an integer number of samples, expect some jitter in the number of samples your fill routine needs to produce. Also, the async threads won't always exactly line up in the same periodic sequence, which can cause additional jitter.