Hi, I am from the data science domain and I would like to perform an image classification task on each frame of the video. I have the video available locally and how do I achieve this?
In the official Core ML examples, I could see examples on the live camera feed but not from the local video files available on the phone.
any starting pointers are really appreciated!
Best, Veeresh
AVAssetReader allows you to read video frames from a movie file. It outputs CMSampleBuffer
s, each of which contains CVPixelBuffer
that can be fed to CoreML. Many color image models work best with 32BGRA pixel format. You can request AVAssetReader to output the said pixel format through outputSettings:
parameter.
The following example reads from a movie file named "IMG_0522.mov" and run Resnet50 image classification on each frame. Note that Resnet50
class is auto-generated when you add Resnet50.mlmodel to the Xcode project. You can find the model in our model gallery (https://developer.apple.com/machine-learning/models/)
let movieURL = Bundle.module.url(forResource: "IMG_0522", withExtension: "mov")!
let model = try! await Resnet50.load(configuration: MLModelConfiguration())
let asset = AVAsset(url: movieURL)
let assetTrack = try! await asset.loadTracks(withMediaType: .video).first!
let assetReader = try! AVAssetReader(asset: asset)
let outputSettings: [String: Any] = [
String(kCVPixelBufferPixelFormatTypeKey): kCVPixelFormatType_32BGRA,
String(kCVPixelBufferWidthKey): 224,
String(kCVPixelBufferHeightKey): 224,
]
let assetReaderTrack = AVAssetReaderTrackOutput(track: assetTrack, outputSettings: outputSettings)
assetReader.add(assetReaderTrack)
assetReader.startReading()
while let sampleBuffer = assetReaderTrack.copyNextSampleBuffer() {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
continue
}
let prediction = try! model.prediction(image: pixelBuffer)
let frameTime = String(format: "% 4.2f", CMSampleBufferGetPresentationTimeStamp(sampleBuffer).seconds)
print("\(frameTime) seconds: \(prediction.classLabel)")
}