With user imported video, how do you filter for frames based on Vision analysis?

I'd like to perform VNDetectHumanBodyPoseRequests on a video that the user imports through the system photo picker or document view controller. I started looking at the Building a Feature-Rich App for Sports Analysis sample code since it has an example where video is imported from disk and then analyzed. However, my end goal is to filter for frames that contain certain poses, so that all frames without them are edited out / deleted (instead of in the sample code drawing on frames with detected trajectories). For pose detection I'm looking at the Detecting Human Actions in a Live Video Feed, but the live video capture isn't quite relevant.

I'm trying to break this down into smaller problems and have a few questions:
  • Should a full video file copy be made before analysis?

  • The Detecting Human Actions in a Live Video Feed sample code uses a Combine pipeline for analyzing live video frames. Since I'm analyzing imported video, would Combine be overkill or a good fit here?

  • After I've detected which frames have a particular pose, how (in AVFoundation terms) do I filter for those frames or edit out / delete the frames without that pose?

I've rewritten my problem more concisely below.

I'd like to perform pose analysis on user imported video, automatically producing an AVFoundation video output where only frames with a detected pose are a part of the result. In the Building a Feature-Rich App for Sports Analysis sample code, analysis happens by implementing the func cameraViewController(_ controller: CameraViewController, didReceiveBuffer buffer: CMSampleBuffer, orientation: CGImagePropertyOrientation) delegate callback, such as in line 326 of GameViewController.swift.

Where I'm stuck is using this analysis to only keep particular frames with a pose detected. Say I've analyzed all CMSampleBuffer frames and classified which ones have the pose I want. How would I only those specific frames for the new video output?
With user imported video, how do you filter for frames based on Vision analysis?
 
 
Q