Calling multiple CoreML Requests in Swift

Working with CoreML and trying to execute two models within the same queue with using the camera as a feed for image recognition. However, I can't seem to allow VNCoreMLRequest to run two models at once. I'm new to iOS and swift. Any suggestions on how to run two models on this request?


func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) { 

var fitness_identifer = "" 
var fitness_confidence = 0 
guard let model_one = try? VNCoreMLModel(for: imagenet_ut().model) else { return } 
guard let model_two = try?VNCoreMLModel(for: ut_legs2().model) else { return } 
let request = VNCoreMLRequest(model: [model_one, model_two]) { (finishedRequest, error) in 
guard let results = finishedRequest.results as? [VNClassificationObservation] else { return } 
guard let Observation = results.first else { return } 


DispatchQueue.main.async(execute: { 
fitness_identifer = Observation.identifier 
fitness_confidence = Int(Observation.confidence * 100) 
self.label.text = "\(Int(fitness_confidence))% it's a \(fitness_identifer)" }) } 
guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } / 


try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request]) }

Here is the error when I attempt the two models at once:

Contextual type 'VNCoreMLModel' cannot be used with array literal

Replies

So if I understood you correctly you want to execute two models and combine the predictions of both models?


I'm affraid you can't do that within a single request (VNCoreMLRequest can't be initalized with an array of models, as the error message suggests). You need to create and perform two requests and synchronize the results. You could for instance store the results of each request in scoped variables and check in each result handler if the result of the other request is already available and aggregate them in this case.


Sidenote: I (and Apple) would highly recommend you cache the models you use for your predictions and don't create then anew for every video frame. This comes with huge performance penelties.

For optimal performance you'll need to combine both models into a single model that takes one input (the image) and produces two outputs.