CoreML could not detect object in iPhone camera's portrait mode

I successfully trained an Object Detection model and exported in CoreML format.
My model has 300 iterations and mean_average_precision is about 0.7.


I then validate it with some images using TC and bounding box drawing util and it could recognize the object pretty well.

I then downloaded the sample project for recognizing objects in live capture here:
https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture


What it's weird is that the object couldn't be detected in iPhone camera's portrait mode, no bounding box is drawn, VNDetectedObjectObservation hasn't returned any results. However, I rotated the iPhone 90 degrees counter-clockwise((home button on the right) and the bounding box appeared. If I move the phone horizontally, the offset between the bounding box and the object become larger.
The same problem happened if I trained the model using CreateML app.

I'm not sure that it could be a problem of Turi Create or Core ML, or the source code itself. Could anyone explain to me, please?

Replies

I had an issue with this example code using my own model. It turns out that imageCropAndScaleOption needed to be set to .scaleFill in the iOS code so that the detection was done over the whole image. By default the detection was set up to crop a square from the middle. Anything outside the square did not get considered for detection. Also the bound boxes used a strainge coorduinate sytem that ment they did not line up with the image until the fix.


This is from my own code but you should be able to find the documentation and work out where it goes if the var name is different.

objectRecognition.imageCropAndScaleOption = .scaleFill


I could see imageCropAndScaleOption was wrong in xCode before making the change.


I think ideally you train with the same aspect ratio or with a mix of landscape and protrate images if you need to support both. If the training data has no typical oriantation, for example a perfect plan view of objects, I think you can fix the oriantation. It gets a bit confusing. For my projects I have only supported one oriantation to make it easier.

Thanks for your answer. Howerver, I found the scale option just make to bounding box more precisely draw onto the object. The orientation problem still exists for me. I use MakeML to label images, it exports all images to 150X200. All the images taken in portrait mode. I just want the model be able to detect in portrait mode but it didn't.
For more information, about the scaleFit option's purpose, it is mentioned here:
https://github.com/apple/turicreate/issues/1016

If you have trained with all the objects in one oriantation this could imply the image is being interpreted wrongly. You may need to fiddle with the EXIF oriantation. For testing hard wire it to a value like CGImagePropertyOrientation.up. It can be changed with device oriantation but I hard wired it. Is your training data portrate also? I think it helps if it matches. If you detect with the app image with a different aspect ratio to the trained image i think it kind of works. However if you have trained with one oriantation and the oriantations do not match it will probbably not work well. Does MakeML have a test app? Is that working?

MakeML convert my image to this information: https://ibb.co/QD4d3NX
And my original images taken from iPhone in portrait mode with EXIF orientation 6 (90 degree CCW): https://ibb.co/7bYqMzw
I also tried to convert images into EXIF orientation 1 (normal) before passing images to MakeML to create images & annotations dataset:
https://ibb.co/0Xk8W1b
All doesn't work.
Which step could possible be the problem here?
1. Original images?
2. Converted images and annotations?
3. Turi Create problem? Here is my code:


import turicreate as tc

#Load data
images = tc.load_images('/Users/kid/Desktop/TowerIp6LocationOnDataset/images')
annotations = tc.SFrame('/Users/kid/Desktop/TowerIp6LocationOnDataset/annotations.csv')
data = images.join(annotations)

model = tc.object_detector.create(data, max_iterations=600)

#evaluation
model.evaluate(testdata)
# This process results mean_average_precision_50 ~ 0.85 -> very high

#export
model.export_coreml('/Users/kid/Desktop/TowerIp6LocationOn.mlmodel')

4. CoreML problem?
5. Breakfast Finder source code problem?

Oriantation is how the data is mapped to an image the correct way up. If it looks right then I think MakeML should interpret it correctly. The data looks OK to me. The "Pixel Height" is less than the width so the data is "landscape" but it is rotated. So thats fine. Its a portrate image. In the phone you need to do the same thing with your frame grab. (If this is the issue) If your width is longest then set Orialtation to be a rotation. Is in function exifOrientationFromDeviceOrientation() in the example I think.

You may have a mistake here when saying "The "Pixel Height" is less than the width so the data is "landscape" but it is rotated"
-> both images I sent, Pixel Height is higher than the width.
I also try to fix the return value of exifOrientationFromDeviceOrientation() to everything the enum could be (.up, .down,etc) but the phone in portrait mode cannot be able to detect tower with correct bounding box.
I set a breakpoint to see the pixelBuffer in the file VisionObjectRecognitionViewController, function captureOutput() and I found out that the image presented by pixelBuffer do not display normally in portrait, it display 90 degrees counter clockwise. Then I tried to build new ML model with all input images rotated 90 degrees counterclockwise. After exporting and push the ML model into BreakfastFinder sample, the application now can detect in portrait mode with the correct bounding box. And also, landscape mode detection doesn't work anymore.


Do you think it's normal?

I don't know how to explain this situation. That's weird.



i am having the same issue and my boundingboxes are way to big. did you solve the problem?

why is it only working in landscape ?