How can I improve the speed of running a `VNDetectHumanBodyPoseRequest` on a `VNImageRequestHandler` for every `CMSampleBuffer` of an imported video?

Below, the sampleBufferProcessor closure is where the Vision body pose detection occurs.

/// Transfers the sample data from the AVAssetReaderOutput to the AVAssetWriterInput,
/// processing via a CMSampleBufferProcessor.
///
/// - Parameters:
///   - readerOutput: The source sample data.
///   - writerInput: The destination for the sample data.
///   - queue: The DispatchQueue.
///   - completionHandler: The completion handler to run when the transfer finishes.
/// - Tag: transferSamplesAsynchronously
private func transferSamplesAsynchronously(from readerOutput: AVAssetReaderOutput,
                                           to writerInput: AVAssetWriterInput,
                                           onQueue queue: DispatchQueue,
                                           sampleBufferProcessor: SampleBufferProcessor,
                                           completionHandler: @escaping () -> Void) {
    /*
     The writerInput continously invokes this closure until finished or
     cancelled. It throws an NSInternalInconsistencyException if called more
     than once for the same writer.
    */
    writerInput.requestMediaDataWhenReady(on: queue) {
        var isDone = false

        /*
         While the writerInput accepts more data, process the sampleBuffer
         and then transfer the processed sample to the writerInput.
        */
        while writerInput.isReadyForMoreMediaData {
            if self.isCancelled {
                isDone = true
                break
            }

            // Get the next sample from the asset reader output.
            guard let sampleBuffer = readerOutput.copyNextSampleBuffer() else {
                // The asset reader output has no more samples to vend.
                isDone = true
                break
            }
            
            // Process the sample, if requested.
            do {
                try sampleBufferProcessor?(sampleBuffer)
            } catch {
                /*
                 The `readingAndWritingDidFinish()` function picks up this
                 error.
                */
                self.sampleTransferError = error
                isDone = true
            }

            // Append the sample to the asset writer input.
            guard writerInput.append(sampleBuffer) else {
                /*
                 The writer could not append the sample buffer.
                 The `readingAndWritingDidFinish()` function handles any
                 error information from the asset writer.
                */
                isDone = true
                break
            }
        }

        if isDone {
            /*
             Calling `markAsFinished()` on the asset writer input does the
             following:
             1. Unblocks any other inputs needing more samples.
             2. Cancels further invocations of this "request media data"
             callback block.
            */
            writerInput.markAsFinished()

            /*
             Tell the caller the reader output and writer input finished
             transferring samples.
             */
            completionHandler()
        }
    }
}

The processor closure runs body pose detection on every sample buffer so that later in the VNDetectHumanBodyPoseRequest completion handler, VNHumanBodyPoseObservation results are fed into a custom Core ML action classifier.

private func videoProcessorForActivityClassification() -> SampleBufferProcessor {
    let videoProcessor: SampleBufferProcessor = { sampleBuffer in
        do {
            let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer)
            
            try requestHandler.perform([self.detectHumanBodyPoseRequest])
        } catch {
            print("Unable to perform the request: \(error.localizedDescription).")
        }
    }
    return videoProcessor
}

How could I improve the performance of this pipeline? After testing with an hour long 4K video at 60 FPS, it took several hours to process running as a Mac Catalyst app on M1 Max.

Accepted Answer

I would benchmark it with Instruments first to see where the time is spent. This code snippet doesn't give any indication.

How can I improve the speed of running a `VNDetectHumanBodyPoseRequest` on a `VNImageRequestHandler` for every `CMSampleBuffer` of an imported video?
 
 
Q