iOS: Recording from two AVCaptureSessions is out of sync

Hey all!

I'm trying to record Video from one AVCaptureSession, and Audio from another AVCaptureSession.

The reason I'm using two separate capture sessions is because I want to disable and enable the Audio one on the fly without interrupting the Video session.

I believe Snapchat and Instagram also use this approach, as background music keeps playing when you open the Camera, and only slightly stutters (caused by the AVAudioSession.setCategory(..) call) once you start recording.

However I couldn't manage to synchronize the two AVCaptureSessions, and whenever I try to record CMSampleBuffers into an AVAssetWriter, the video and audio frames are out of sync.

Here's a quick YouTube video showcasing the offset: https://youtube.com/shorts/jF1arThiALc

I notice two bugs:

  1. The video and audio tracks are out of sync - video frames start almost a second before the first audio sample starts to be played back, and towards the end the delay is also noticeable because the video stops / freezes while the audio continues to play.
  2. The video contains frames from BEFORE I even pressed startRecording(), as if my iPhone had a time machine!

I am not sure how the second one can even happen, so at this point I'm asking for help if anyone has any experience with that.

Roughly my code:

let videoCaptureSession = AVCaptureSession()
let audioCaptureSession = AVCaptureSession()

func setup() {
  // ...adding videoCaptureSession outputs (AVCaptureVideoDataOutput)
  // ...adding audioCaptureSession outputs (AVCaptureAudioDataOutput)
  videoCaptureSession.startRunning()
}

func startRecording() {
  self.assetWriter = AVAssetWriter(outputURL: tempURL, fileType: .mov)
  self.videoWriter = AVAssetWriterInput(...)
  assetWriter.add(videoWriter)
  self.audioWriter = AVAssetWriterInput(...)
  assetWriter.add(audioWriter)

  AVAudioSession.sharedInstance().setCategory(.playAndRecord, options: [.mixWithOthers, .defaultToSpeaker])
  audioCaptureSession.startRunning() // <-- lazy start that
}

func captureOutput(_ captureOutput: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from _: AVCaptureConnection) {
  // Record Video Frame/Audio Sample to File in custom `RecordingSession` (AVAssetWriter)
  if isRecording {
    switch captureOutput {
    case is AVCaptureVideoDataOutput:
      self.videoWriter.append(sampleBuffer)
    case is AVCaptureAudioDataOutput:
      // TODO: Do I need to update the PresentationTimestamp here to synchronize it to the other capture session? or not?
      self.audioWriter.append(sampleBuffer)
    default:
      break
    }
  }
}

Full code here:

  1. Video Capture Session Configuration
  2. Audio Capture Session Configuration
  3. Later on, startRecording() call
  4. RecordingSession, my AVAssetWriter abstraction
  5. Audio Session activation
  6. And finally, writing the CMSampleBuffers
Answered by mrousavy in 772797022

Oh my god, I finally found the issue.

The Video Stabilization mode cinematicExtended actually caused a huge delay in the Video Pipeline because it keeps an internal buffer of frames to stabilize them. This just takes time. Apparently this takes up to 1 second on my modern iPhone 15 Pro. The way my RecordingSession was designed is that I just wrote the Video and Audio buffers to the session once I received them, and I used their presentation timestamps as the timestamp for the video.

What I didn't account for here was that the presentation timestamp is not the same value as the current time, the frame might be older and we just got it after it has went through a bunch of processing (stabilization)!

E.g. relative to the timer we see on screen - when I start recording, the time is 5:00, but the first frame that actually arrives at 5:00 still has a presentation timestamp of 3:86, because that's when it was captured from the Camera. The mistake I previously made here was that we wrote this Frame to the file - which I shouldn't do, it's from the past!

This PR fixes that behaviour and only writes Frames to the file that have a presentation timestamp later than when we actually started recording, so the first frame that should've been written to the file was the one with the presentation timestamp 5:00, which might come in when the current time is already 6:86, so it's 1:86 seconds too late.

Here's my PR with code which fixes the issue: https://github.com/mrousavy/react-native-vision-camera/pull/2206

What a funny bug, I was pulling my hairs out on this one.

Accepted Answer

Oh my god, I finally found the issue.

The Video Stabilization mode cinematicExtended actually caused a huge delay in the Video Pipeline because it keeps an internal buffer of frames to stabilize them. This just takes time. Apparently this takes up to 1 second on my modern iPhone 15 Pro. The way my RecordingSession was designed is that I just wrote the Video and Audio buffers to the session once I received them, and I used their presentation timestamps as the timestamp for the video.

What I didn't account for here was that the presentation timestamp is not the same value as the current time, the frame might be older and we just got it after it has went through a bunch of processing (stabilization)!

E.g. relative to the timer we see on screen - when I start recording, the time is 5:00, but the first frame that actually arrives at 5:00 still has a presentation timestamp of 3:86, because that's when it was captured from the Camera. The mistake I previously made here was that we wrote this Frame to the file - which I shouldn't do, it's from the past!

This PR fixes that behaviour and only writes Frames to the file that have a presentation timestamp later than when we actually started recording, so the first frame that should've been written to the file was the one with the presentation timestamp 5:00, which might come in when the current time is already 6:86, so it's 1:86 seconds too late.

Here's my PR with code which fixes the issue: https://github.com/mrousavy/react-native-vision-camera/pull/2206

What a funny bug, I was pulling my hairs out on this one.

iOS: Recording from two AVCaptureSessions is out of sync
 
 
Q