Core ML image classification on videos

Hi, I am from the data science domain and I would like to perform an image classification task on each frame of the video. I have the video available locally and how do I achieve this?

In the official Core ML examples, I could see examples on the live camera feed but not from the local video files available on the phone.

any starting pointers are really appreciated!

Best, Veeresh

Answered by Frameworks Engineer in 725466022

AVAssetReader allows you to read video frames from a movie file. It outputs CMSampleBuffers, each of which contains CVPixelBuffer that can be fed to CoreML. Many color image models work best with 32BGRA pixel format. You can request AVAssetReader to output the said pixel format through outputSettings: parameter.

The following example reads from a movie file named "IMG_0522.mov" and run Resnet50 image classification on each frame. Note that Resnet50 class is auto-generated when you add Resnet50.mlmodel to the Xcode project. You can find the model in our model gallery (https://developer.apple.com/machine-learning/models/)

let movieURL = Bundle.module.url(forResource: "IMG_0522", withExtension: "mov")!
let model = try! await Resnet50.load(configuration: MLModelConfiguration())

let asset = AVAsset(url: movieURL)
let assetTrack = try! await asset.loadTracks(withMediaType: .video).first!
let assetReader = try! AVAssetReader(asset: asset)

let outputSettings: [String: Any] = [
    String(kCVPixelBufferPixelFormatTypeKey): kCVPixelFormatType_32BGRA,
    String(kCVPixelBufferWidthKey): 224,
    String(kCVPixelBufferHeightKey): 224,
]
let assetReaderTrack = AVAssetReaderTrackOutput(track: assetTrack, outputSettings: outputSettings)

assetReader.add(assetReaderTrack)
assetReader.startReading()

while let sampleBuffer = assetReaderTrack.copyNextSampleBuffer() {
    guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
        continue
    }

    let prediction = try! model.prediction(image: pixelBuffer)
    let frameTime = String(format: "% 4.2f", CMSampleBufferGetPresentationTimeStamp(sampleBuffer).seconds)
    print("\(frameTime) seconds: \(prediction.classLabel)")
}

Probably this can help you.

 if let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) {
    // here you need to pass the imageBuffer of your main model ml
 }
Accepted Answer

AVAssetReader allows you to read video frames from a movie file. It outputs CMSampleBuffers, each of which contains CVPixelBuffer that can be fed to CoreML. Many color image models work best with 32BGRA pixel format. You can request AVAssetReader to output the said pixel format through outputSettings: parameter.

The following example reads from a movie file named "IMG_0522.mov" and run Resnet50 image classification on each frame. Note that Resnet50 class is auto-generated when you add Resnet50.mlmodel to the Xcode project. You can find the model in our model gallery (https://developer.apple.com/machine-learning/models/)

let movieURL = Bundle.module.url(forResource: "IMG_0522", withExtension: "mov")!
let model = try! await Resnet50.load(configuration: MLModelConfiguration())

let asset = AVAsset(url: movieURL)
let assetTrack = try! await asset.loadTracks(withMediaType: .video).first!
let assetReader = try! AVAssetReader(asset: asset)

let outputSettings: [String: Any] = [
    String(kCVPixelBufferPixelFormatTypeKey): kCVPixelFormatType_32BGRA,
    String(kCVPixelBufferWidthKey): 224,
    String(kCVPixelBufferHeightKey): 224,
]
let assetReaderTrack = AVAssetReaderTrackOutput(track: assetTrack, outputSettings: outputSettings)

assetReader.add(assetReaderTrack)
assetReader.startReading()

while let sampleBuffer = assetReaderTrack.copyNextSampleBuffer() {
    guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
        continue
    }

    let prediction = try! model.prediction(image: pixelBuffer)
    let frameTime = String(format: "% 4.2f", CMSampleBufferGetPresentationTimeStamp(sampleBuffer).seconds)
    print("\(frameTime) seconds: \(prediction.classLabel)")
}
Core ML image classification on videos
 
 
Q