Coordination of Video Capture and Audio Engine Start in iOS Development

Question:

When implementing simultaneous video capture and audio processing in an iOS app, does the order of starting these components matter, or can they be initiated in any sequence?

I have an actor responsible for initiating video capture using the setCaptureMode function. In this actor, I also call startAudioEngine to begin the audio engine and register a resultObserver. While the audio engine starts successfully, I notice that the resultObserver is not invoked when startAudioEngine is called synchronously. However, it works correctly when I wrap the call in a Task.

Could you please explain why the synchronous call to startAudioEngine might be blocking the invocation of the resultObserver? What would be the best practice for ensuring both components work effectively together? Additionally, if I were to avoid using Task, what approach would be required? Lastly, is the startAudioEngine effective from the start time of the video capture (00:00)?

Platform: Xcode 16, Swift 6, iOS 18

References:

  1. Classifying Sounds in an Audio Stream – In my case, the analyzeAudio() method is not invoked.
  2. Setting Up a Capture Session – Here, the focus is on video capture.
  3. Classifying Sounds in an Audio File

Code Snippet: (For further details. setVideoCaptureMode() surfaces the problem.)


// ensures all operations happen off of the `@MainActor`.
actor CaptureService {
  	...

    nonisolated private let resultsObserver1 = ResultsObserver1()
  	...

    private func setUpSession() throws { .. }
    ...

	setVideoCaptureMode() throws {
           captureSession.beginConfiguration()
           defer { captureSession.commitConfiguration() }

	     /* -- Works fine (analyseAudio is printed)
	     Task {
		     self.resultsObserver1.startAudioEngine()
	     }
	     */

       self.resultsObserver1.startAudioEngine()  // Does not work - analyzeAudio not printed

       captureSession.sessionPreset = .high
       try addOutput(movieCapture.output)
       if isHDRVideoEnabled {
           setHDRVideoEnabled(true)
       }

       updateCaptureCapabilities()
}

Answered by DTS Engineer in 807484022

Hey @Blume,

  1. Why does the audioEngine stop within the configuration block?

Good question, AVCaptureDevice and AVAudioEngine both have access to the same underlying audio capture hardware. It appears that, for whatever reason, this particular pattern is creating a conflict of sorts. You should file a bug report for this issue using Feedback Assistant.

  1. What specifically makes the “working case” (when called under Task) function correctly, as opposed to the “non-working case” (when called synchronously)?

The Task introduces asynchronous execution. If you add logging to beginConfiguration, commitConfiguration, and inside your Task, you will see that the startAudioEngine (in the Task) does not execute until after the call to commitConfiguration. In contrast, without the Task, startAudioEngine executes between beginConfiguration and commitConfiguration calls (which appears to be problematic).

Best regards,

Greg


/////////////////////////
/////////////////////////

// ensures all operations happen off of the `@MainActor`.
actor CaptureService {
  	...

    nonisolated private let resultsObserver1 = ResultsObserver1()
  	...

    private func setUpSession() throws { .. }
    ...

	setVideoCaptureMode() throws {
           captureSession.beginConfiguration()
           defer { captureSession.commitConfiguration() }

	     /* -- Works fine (analyzeAudio is printed)
	     Task {
		     self.resultsObserver1.startAudioEngine()
	     }
	     */

       self.resultsObserver1.startAudioEngine()  // Does not work - analyzeAudio not printed

       captureSession.sessionPreset = .high
       try addOutput(movieCapture.output)
       if isHDRVideoEnabled {
           setHDRVideoEnabled(true)
       }

       updateCaptureCapabilities()
}

/////////////////////////
/////////////////////////

class ResultsObserver1 {
    let resultsObserver2 = ResultsObserver2()

    var classifiedText: String = ""
    var confidence: Double = 0.0
    private var audioEngine: AVAudioEngine?
    private var soundAnalyzer: SNAudioStreamAnalyzer?
    private var inputFormat: AVAudioFormat?
    let analysisQueue = DispatchQueue(label: "com.example.AnalysisQueue")

    func analyzeAudio(buffer: AVAudioBuffer, at time: AVAudioTime) {
        print("analyzeAudio")
        analysisQueue.async {
            print("analyze")
            self.soundAnalyzer?.analyze(buffer,
                                        atAudioFramePosition: time.sampleTime)
        }
    }

    func stopAudioEngine() {
        print("stopAudioEngine")
        soundAnalyzer?.removeAllRequests()
        audioEngine?.inputNode.removeTap(onBus: 0)
        audioEngine?.stop()
        soundAnalyzer = nil
        audioEngine = nil
        inputFormat = nil
    }
    
    // Setup audio analysis using SNAudioStreamAnalyzer
    func startAudioEngine() {
        print("startAudioEngine")
        // Create a new audio engine.
        audioEngine = AVAudioEngine()
        print("audioEngine: \(String(describing: audioEngine))")


        // Get the native audio format of the engine's input bus.
        let inputBus = AVAudioNodeBus(0)
        inputFormat = audioEngine?.inputNode.inputFormat(forBus: inputBus)
        print("inputFormat: \(String(describing: inputFormat))")
        
        guard let inputFormat = inputFormat else {
            print("Failed to get input format")
            return
        }

        do {
            // Start the stream of audio data.
            try audioEngine?.start()
            print("audio engine started")
        } catch {
            print("Unable to start AVAudioEngine: \(error.localizedDescription)")
        }
        
        // Create a new stream analyzer.
        soundAnalyzer = SNAudioStreamAnalyzer(format: inputFormat)
        print("soundAnalyzer: \(String(describing: soundAnalyzer))")
        
        // Use Apple's built-in classifier version 1
        let version1 = SNClassifierIdentifier.version1

        do {
            // Create a classification request for version 1
            let request = try SNClassifySoundRequest(classifierIdentifier: version1)
            
            // Add a sound classification request that reports to an observer.
            try soundAnalyzer?.add(request,
                                   withObserver: resultsObserver2)
            print("Added request to soundAnalyzer with the specified observer")
        } catch {
            print("Error setting up sound analysis: \(error)")
            return
        }
        
        audioEngine?.inputNode.installTap(onBus: 0, bufferSize: 8192, format: inputFormat, block: { buffer, when in
            self.analyzeAudio(buffer: buffer, at: when)
        })
        
        /*
        audioEngine?.inputNode.installTap(onBus: 0, bufferSize: 8192, format: inputFormat, block: { buffer, when in
            print("buffer recieved")
            self.soundAnalyzer?.analyze(buffer, atAudioFramePosition: AVAudioFramePosition(buffer.frameLength))
        })
         */

        do {
            // Prepare and start the audio engine
            audioEngine?.prepare()
            try audioEngine?.start()
        } catch {
            print("Error starting audio engine: \(error)")
        }
        
    }

}

class ResultsObserver2: NSObject, SNResultsObserving {

    /// Notifies the observer when a request generates a prediction.
    func request(_ request: SNRequest, didProduce result: SNResult) {
        // Downcast the result to a classification result.
        guard let result = result as? SNClassificationResult else  { return }


        // Get the prediction with the highest confidence.
        guard let classification = result.classifications.first else { return }


        // Get the starting time.
        let timeInSeconds = result.timeRange.start.seconds


        // Convert the time to a human-readable string.
        let formattedTime = String(format: "%.2f", timeInSeconds)
        print("Analysis result for audio at time: \(formattedTime)")


        // Convert the confidence to a percentage string.
        let percent = classification.confidence * 100.0
        let percentString = String(format: "%.2f%%", percent)


        // Print the classification's name (label) with its confidence.
        print("\(classification.identifier): \(percentString) confidence.\n")
    }


    /// Notifies the observer when a request generates an error.
    func request(_ request: SNRequest, didFailWithError error: Error) {
        print("The analysis failed: \(error.localizedDescription)")
    }


    /// Notifies the observer when a request is complete.
    func requestDidComplete(_ request: SNRequest) {
        print("The request completed successfully!")
    }

}

Hello @Blume,

I'm happy to take a look into this, there is a lot of code here, would you be able to provide a focused sample project that reproduces the issue you are seeing?

Best regards,

Greg

Hi Greg,

Thank you for taking the time to look into this.

Unfortunately, I wasn't able to upload the file directly due to the "Add File" option not allowing me to select the 19.9 MB compressed zip file. To make it accessible, I’ve uploaded the file to an S3 bucket, and you can download it here: https://curvsort-public.s3.ap-south-1.amazonaws.com/AVCamBuildingACameraApp_1.zip. Please let me know once you've downloaded it, and I will delete it from the bucket.

The key files with the focused issue are ESoundClassifier and CaptureService (specifically resultsObserver1 and the sections marked Works 1/2/3).

To Reproduce the Issue:

I am running the app on an iPhone 11 with Swift 5 and Xcode 16

  • Problem Case: In this case, there are no logs for the analyze and analyzeAudio methods.
  • Working Case: To observe the working flow, comment out the code at // Does not work 1 and uncomment the three lines at // Works 3. You should then see logs for both analyze and analyzeAudio.

Swift 6 Concurrency Issues:

After switching the build setting to Swift 6, the following concurrency-related errors are reported:

  1. ThumbnailButton Issue: The PhotosPicker throws an error when displaying the thumbnail: "Main actor-isolated property 'thumbnail' cannot be referenced from a Sendable closure."
  2. AVAudioBuffer: The AVAudioBuffer is also flagged as a non-sendable type, which causes concurrency errors.

Apple Source Code References:

  1. AVCam: Building a Camera App
  2. Classifying Sounds in an Audio Stream

I hope this provides the necessary context. Let me know if you need any further details.

Best regards

Hello @Blume,

In the case where it is "not working", the AVAudioEngine is no longer running. It appears that, if you start the audio engine within the AVCaptureSession configuration block, the audio engine will stop running when you commit the configuration change:


defer {
            print(resultsObserver1.audioEngine?.isRunning) // true
            captureSession.commitConfiguration()
            print(resultsObserver1.audioEngine?.isRunning) // false
        } 

I recommend that you either start the audio engine either before or after configuring the capture session.

Best regards,

Greg

Accepted Answer

Hey @Blume,

  1. Why does the audioEngine stop within the configuration block?

Good question, AVCaptureDevice and AVAudioEngine both have access to the same underlying audio capture hardware. It appears that, for whatever reason, this particular pattern is creating a conflict of sorts. You should file a bug report for this issue using Feedback Assistant.

  1. What specifically makes the “working case” (when called under Task) function correctly, as opposed to the “non-working case” (when called synchronously)?

The Task introduces asynchronous execution. If you add logging to beginConfiguration, commitConfiguration, and inside your Task, you will see that the startAudioEngine (in the Task) does not execute until after the call to commitConfiguration. In contrast, without the Task, startAudioEngine executes between beginConfiguration and commitConfiguration calls (which appears to be problematic).

Best regards,

Greg

Coordination of Video Capture and Audio Engine Start in iOS Development
 
 
Q