I'm trying to understand what the reference coordinate system is when making a VNCoreMLRequest that has a regionOfInterest set. The results I'm getting to not appear to be consistent with other parts of Vision, nor do they appear to be consistent valid bounding boxes in general.
var humanRequest = VNDetectHumanRectanglesRequest() humanRequest.regionOfInterest = CGRect(0.5, 0, 0.5, 1)
This request instructs Vision to only consider the right-half of the image when looking for humans. However, the bounding boxes of any observations I get are defined within the identity region of interest (0,0,1,1). In other words, a observation who's boundingBox has an x-value of 0.5 starts in the middle of the image, not at the 3/4 mark, which would be the case if the boundingBox was defined within the coordinate system of the regionOfInterest.
In short, most Vision Observations do not appear to return the bounding box within the region of interests coordinate system.
However, when I make a request to VNCoreMLRequest() with a custom trained detection model (trained using CreateML), the coordinate system does appear to be based off the region of interest, sort of. I get relatively stable results if I use VNNormalizedIdentityRect as the regionOfInterest, but when I specify a custom region then the boundingBox values don't really line up with much. Attempting to denormalize them back to the identity rect sort of works, but not really. And when doing so, sometimes the width and height values would be invalid for Vision.
ex: CGRect(0.75, 0, 0.95, 1)
In such a bounding box, it's implying that the x-value starts at 0.75 but then the width should be 1.0 - 0.75 at most, otherwise the box is being defined to be way off the screen.
Given that, how do I interpret the boundingBox of a VNRecognizedObjectObservation, generated by a VNCoreMLRequest who's regionOfInterest has been set to something other than the identity rect?