How can I improve the speed of running a `VNDetectHumanBodyPoseRequest` on a `VNImageRequestHandler` for every `CMSampleBuffer` of an imported video?

Below, the sampleBufferProcessor closure is where the Vision body pose detection occurs.

/// Transfers the sample data from the AVAssetReaderOutput to the AVAssetWriterInput,
/// processing via a CMSampleBufferProcessor.
/// - Parameters:
///   - readerOutput: The source sample data.
///   - writerInput: The destination for the sample data.
///   - queue: The DispatchQueue.
///   - completionHandler: The completion handler to run when the transfer finishes.
/// - Tag: transferSamplesAsynchronously
private func transferSamplesAsynchronously(from readerOutput: AVAssetReaderOutput,
                                           to writerInput: AVAssetWriterInput,
                                           onQueue queue: DispatchQueue,
                                           sampleBufferProcessor: SampleBufferProcessor,
                                           completionHandler: @escaping () -> Void) {
     The writerInput continously invokes this closure until finished or
     cancelled. It throws an NSInternalInconsistencyException if called more
     than once for the same writer.
    writerInput.requestMediaDataWhenReady(on: queue) {
        var isDone = false

         While the writerInput accepts more data, process the sampleBuffer
         and then transfer the processed sample to the writerInput.
        while writerInput.isReadyForMoreMediaData {
            if self.isCancelled {
                isDone = true

            // Get the next sample from the asset reader output.
            guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else {
                // The asset reader output has no more samples to vend.
                isDone = true
            // Process the sample, if requested.
            do {
                try sampleBufferProcessor?(sampleBuffer)
            } catch {
                 The `readingAndWritingDidFinish()` function picks up this
                self.sampleTransferError = error
                isDone = true

            // Append the sample to the asset writer input.
            guard writerInput.append(sampleBuffer) else {
                 The writer could not append the sample buffer.
                 The `readingAndWritingDidFinish()` function handles any
                 error information from the asset writer.
                isDone = true

        if isDone {
             Calling `markAsFinished()` on the asset writer input does the
             1. Unblocks any other inputs needing more samples.
             2. Cancels further invocations of this "request media data"
             callback block.

             Tell the caller the reader output and writer input finished
             transferring samples.

The processor closure runs body pose detection on every sample buffer so that later in the VNDetectHumanBodyPoseRequest completion handler, VNHumanBodyPoseObservation results are fed into a custom Core ML action classifier.

private func videoProcessorForActivityClassification() -> SampleBufferProcessor {
    let videoProcessor: SampleBufferProcessor = { sampleBuffer in
        do {
            let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer)
            try requestHandler.perform([self.detectHumanBodyPoseRequest])
        } catch {
            print("Unable to perform the request: \(error.localizedDescription).")
    return videoProcessor

How could I improve the performance of this pipeline? After testing with an hour long 4K video at 60 FPS, it took several hours to process running as a Mac Catalyst app on M1 Max.


I would benchmark it with Instruments first to see where the time is spent. This code snippet doesn't give any indication.

  • Thanks, good idea. One way I've improved speed so far is by simply using VNVideoProcessor to do a lot of the heavy lifting. I somehow missed that it exists in the WWDC sessions, oops!

Add a Comment