Posts

Post not yet marked as solved
0 Replies
365 Views
Context So basically I've trained my model for object detection with +4k images. Under preview I'm able to check the prediction for Image "A" which detects two labels with 100% and its Bounding Boxes look accurate. The problem itself However, inside the Swift Playground, when I try to perform object detection using the same model and same Image I don't get same results. What I expected Is that after performing the request and processing the array of VNRecognizedObjectObservation would show the very same results that appear in CreateML Preview. Notes: So the way I'm importing the model into playground is just by drag and drop. I've trained the images using JPEG format. The test Image is rotated so that it looks vertical using MacOS Finder rotation tool. I've tried, while creating VNImageRequestHandlerto pass a different orientation, with the same result. Swift Playground code This is the code I'm using. import UIKit import Vision do{ let model = try MYMODEL_FROMCREATEML(configuration: MLModelConfiguration()) let mlModel = model.model let coreMLModel = try VNCoreMLModel(for: mlModel) let request = VNCoreMLRequest(model: coreMLModel) { request, error in guard let results = request.results as? [VNRecognizedObjectObservation] else { return } results.forEach { result in print(result.labels) print(result.boundingBox) } } let image = UIImage(named: "TEST_IMAGE.HEIC")! let requestHandler = VNImageRequestHandler(cgImage: image.cgImage!) try requestHandler.perform([request]) } catch { print(error) } Additional Notes & Uncertainties Not sure if this is relevant, but just in case: I've trained the model using pictures I took from my iPhone using 48MP HEIC format. All photos were on vertical position. With a python script I overwrote the EXIF orientation to 1 (Normal). This was in order to be able to annotate the images using the CVAT tool and then convert to CreateML annotation format. Assumption #1 Since I've read that Object Detection in Create ML is based on YOLOv3 architecture which inside the first layer resizes the image dimension, meaning that I don't have to worry about using very large images to train my model. Is this correct? Assumption #2 Also makes me asume that the same thing happens when I try to make a prediction?
Posted
by joe_dev.
Last updated
.