I'm building a Camera app, where I have two AVCaptureSession
s, one for video and one for audio. (See this for an explanation why I don't just have one).
I receive my CMSampleBuffer
s in the AVCaptureVideoDataOutput
and AVCaptureAudioDataOutput
delegates.
Now, when I enable the video stabilization mode "cinematicExtended", the AVCaptureVideoDataOutput
has a 1-2 seconds delay, meaning I will receive my audio CMSampleBuffer
s 1-2 seconds earlier than I will receive my video CMSampleBuffer
s!
This is the code:
func captureOutput(_ captureOutput: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from _: AVCaptureConnection) {
let type = captureOutput is AVCaptureVideoDataOutput ? "Video" : "Audio"
let timestamp = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
print("Incoming \(type) buffer at \(timestamp.seconds) seconds...")
}
Without video stabilization, this logs:
Incoming Audio frame at 107862.52558333334 seconds...
Incoming Video frame at 107862.535921166 seconds...
Incoming Audio frame at 107862.54691666667 seconds...
Incoming Video frame at 107862.569257333 seconds...
Incoming Audio frame at 107862.56825 seconds...
Incoming Video frame at 107862.585925333 seconds...
Incoming Audio frame at 107862.58958333333 seconds...
With video stabilization, this logs:
Incoming Audio frame at 107862.52558333334 seconds...
Incoming Video frame at 107861.535921166 seconds...
Incoming Audio frame at 107862.54691666667 seconds...
Incoming Video frame at 107861.569257333 seconds...
Incoming Audio frame at 107862.56825 seconds...
Incoming Video frame at 107861.585925333 seconds...
Incoming Audio frame at 107862.58958333333 seconds...
As you can see, the video frames arrive almost a full second later than when they are intended to be presented!
There are a few guides on how to use AVAssetWriter
online, but all recommend to start the AVAssetWriter
session once the first video frame arrives - in my case I cannot do that, since the first 1 second of video frames is from before the user even started the recording.
I also can't really wait 1 second here, as then I would lose 1 second of audio samples, since those are realtime and not delayed.
I also can't really start the session on the first audio frame and drop all video frames until that point, since then the resulting video would start with one blank frame, as the video frame is never exactly on that first audio frame timestamp.
Any advices on how I can synchronize that?
Here is my code: RecordingSession.swift