Post not yet marked as solved
Post marked as unsolved with 0 replies, 365 views
Context
So basically I've trained my model for object detection with +4k images. Under preview I'm able to check the prediction for Image "A" which detects two labels with 100% and its Bounding Boxes look accurate.
The problem itself
However, inside the Swift Playground, when I try to perform object detection using the same model and same Image I don't get same results.
What I expected
Is that after performing the request and processing the array of VNRecognizedObjectObservation would show the very same results that appear in CreateML Preview.
Notes:
So the way I'm importing the model into playground is just by drag and drop.
I've trained the images using JPEG format.
The test Image is rotated so that it looks vertical using MacOS Finder rotation tool.
I've tried, while creating VNImageRequestHandlerto pass a different orientation, with the same result.
Swift Playground code
This is the code I'm using.
import UIKit
import Vision
do{
let model = try MYMODEL_FROMCREATEML(configuration: MLModelConfiguration())
let mlModel = model.model
let coreMLModel = try VNCoreMLModel(for: mlModel)
let request = VNCoreMLRequest(model: coreMLModel) { request, error in
guard let results = request.results as? [VNRecognizedObjectObservation] else {
return
}
results.forEach { result in
print(result.labels)
print(result.boundingBox)
}
}
let image = UIImage(named: "TEST_IMAGE.HEIC")!
let requestHandler = VNImageRequestHandler(cgImage: image.cgImage!)
try requestHandler.perform([request])
} catch {
print(error)
}
Additional Notes & Uncertainties
Not sure if this is relevant, but just in case: I've trained the model using pictures I took from my iPhone using 48MP HEIC format. All photos were on vertical position. With a python script I overwrote the EXIF orientation to 1 (Normal). This was in order to be able to annotate the images using the CVAT tool and then convert to CreateML annotation format.
Assumption #1
Since I've read that Object Detection in Create ML is based on YOLOv3 architecture which inside the first layer resizes the image dimension, meaning that I don't have to worry about using very large images to train my model. Is this correct?
Assumption #2
Also makes me asume that the same thing happens when I try to make a prediction?