X-Axis of VNFaceObservation boundingBox Issue

Hi, I'm not sure whether to post this here in Vision or in ARKit as it pertains to both. I followed the Apple project "Using Vision in Real Time with ARKit" and added Vision's VNDetectFaceRectanglesRequest.


The issue I have is the mapping from Vision to the AR/video screen. I'm testing in portrait mode and the Y-axis works fine. The X-axis seems to be wrong since ARKit appears to have a wider camera than the camera view on the iPhone.


What is the correct way to display the boundingBox on the screen/UIKit/ARKit space? I can't use methods like `self.cameraLayer.layerRectConverted(fromMetadataOutputRect: transformedRect)` since I'm not using AVCaptureSession (using ARKit instead).


The following code's Y-axis works, but the X-axis is skewed (more so on the sides of the screen).


        // face is an instance of VNFaceObservation
        let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -view.frame.height)
        let translate = CGAffineTransform.identity.scaledBy(x: view.frame.width, y: view.frame.height)
        let rect = face.boundingBox.applying(translate).applying(transform)


Using ARKit + Vision, I'm not sure how to convert the X axis from Vision's normalized rect to ARKit/UKit's coordinate space. The X origin of the CGRect is noticeably off. The X origin seems like it should be further outward because the iPhone's camera is wider.


Thank you

Replies

Still working on solving this problem. I understand it better, but not sure the best path forward.


The problem is that ARKit provides the image buffer (frame.capturedImage) with a camera resolution of 1920 x 1440. The screen of the iPhone XS is 375 x 812 points. It seems like ARKit can see more than it can display on the phone screen. Vision is performing face rectangle detection requests on the full image seen by the camera (this is overkill for my purpose). The problem that I've ran into above is that the mapping between what the camera sees & what the user sees on the screen is not 1:1. What is on the far left of the camera (and seen by Vision's object detection) may be outside of the screen's frame shown to the user.


1. Going forward, I need to convert the VNFaceObservation's boundingBox to one that fits the phone's screen. I've tried using ARFrame's displayTransform(for:viewportSize:), but I don't think I'm using it correctly (if it is the correct method to use).


2. Another option (I'm not sure the complexity) would be to crop the image received by Vision to match what's in the user's screen space. This would be a less expensive Vision request.


If anyone has any tips on accomplishing either of these two tasks, it would be much appreciated.

Im facing the same issue. The vision results are good getting perfectly cropped image. But when passing the vision bounding box with some CGAffineTransform (same you are using) the y-axis is always good but the x-axis is not correct. The same code is working in all other devices except iphone 8,8+,XR,XS. What im doing is im trying to detect rectangles using vision VNDetcetRectangles and finding the real width and height of the rectangle using ARKit. Please let me know if you have found any solution for this issue. Im sitting on this since one month. I have tried multiple things but no luck.

VNImagePointForNormalizedPoint might be what you are looking for?