Object Detection using Vision perf… | Apple Developer Forums

Object Detection using Vision performs different than in Create ML Preview

Context

So basically I've trained my model for object detection with +4k images. Under preview I'm able to check the prediction for Image "A" which detects two labels with 100% and its Bounding Boxes look accurate.

The problem itself

However, inside the Swift Playground, when I try to perform object detection using the same model and same Image I don't get same results.

What I expected

Is that after performing the request and processing the array of VNRecognizedObjectObservation would show the very same results that appear in CreateML Preview.

Notes:

So the way I'm importing the model into playground is just by drag and drop.
I've trained the images using JPEG format.
The test Image is rotated so that it looks vertical using MacOS Finder rotation tool.
I've tried, while creating VNImageRequestHandlerto pass a different orientation, with the same result.

Swift Playground code

This is the code I'm using.

import UIKit
import Vision

do{
    let model = try MYMODEL_FROMCREATEML(configuration: MLModelConfiguration())

    let mlModel = model.model
    let coreMLModel = try VNCoreMLModel(for: mlModel)


    let request = VNCoreMLRequest(model: coreMLModel) { request, error in
        
        guard let results = request.results as? [VNRecognizedObjectObservation] else {
            return
        }
        results.forEach { result in
            print(result.labels)
            print(result.boundingBox)
        }

    }

    let image = UIImage(named: "TEST_IMAGE.HEIC")!
    
    let requestHandler = VNImageRequestHandler(cgImage: image.cgImage!)

    try requestHandler.perform([request])
} catch {
    print(error)
}

Additional Notes & Uncertainties

Not sure if this is relevant, but just in case: I've trained the model using pictures I took from my iPhone using 48MP HEIC format. All photos were on vertical position. With a python script I overwrote the EXIF orientation to 1 (Normal). This was in order to be able to annotate the images using the CVAT tool and then convert to CreateML annotation format.

Assumption #1

Since I've read that Object Detection in Create ML is based on YOLOv3 architecture which inside the first layer resizes the image dimension, meaning that I don't have to worry about using very large images to train my model. Is this correct?

Assumption #2

Also makes me asume that the same thing happens when I try to make a prediction?