General guidelines for improving body pose action classifier performance

I just got an app feature working where the user imports a video file, each frame is fed to a custom action classifier, and then only frames with a certain action classified are exported.

However, I'm finding that testing a one hour 4K video at 60 FPS is taking an unreasonably long time - it's been processing for 7 hours now on a MacBook Pro with M1 Max running the Mac Catalyst app. Are there any techniques or general guidance that would help with improving performance? As much as possible I'd like to preserve the input video quality, especially frame rate. One hour length for the video is expected, as it's of a tennis session (could be anywhere from 10 minutes to a couple hours). I made the body pose action classifier with Create ML.

It's likely the pose processing taking long, since the video is long and at 4k, 60fps. If you only care about the action classification results, e.g., if you only want to display the action classification on the original video, rather than render back poses on every original frames, then you may try downsampling the video for training/testing purpose.

For example, you may train the model at 30fps, and at testing time, skip every one frame to feed into the model. If your video is long, you may even try to use 20fps, 16fps, etc. for classification only. It's likely won't affect the classification performance.

However, if you really want to render the extracted poses back to every original frame, you may have to extract poses at the original FPS. Or you could also render every other frame, it's likely not to affect visual performance too much.

When you process a video file, VNVideoProcessor has this option to set your target frame rate (or time interval), when processing a video, e.g., extracting a pose.

https://developer.apple.com/documentation/vision/vnvideoprocessor/requestprocessingoptions

General guidelines for improving body pose action classifier performance
 
 
Q