I'm using the Vision framework to perform OCR via a
VNRecognizeTextRequest call, and I'm trying to locate each individual character in the resulting
VNRecognizedText observations. However, when I call the
boundingBox(for range: Range<String.Index>) method on any recognized text and for any range within the recognized text, I get the same bounding box back. This bounding box corresponds to the bounding box of the entire string.
Am I misunderstanding the
boundingBox(for:) method, or is there some other way to get discrete location info for single characters within a recognized text observation?
Thanks in advance!
After looking into this more, I've realized that there's some sort of link with word groups and whitespace.
Consider a recognized text observation with a string value of "Foo bar". Calling `boundingBox(for:)` for each character in "Foo" returns the exact same bounding box which, based on the dimensions, seems to correspond to the entire substring "Foo" instead of the single character whose range we pass into the `boundingBox` method. Then, in another bit of strange behavior, the `boundingBox` for the whitespace character is simply an empty region at the origin whose edges don't correspond with the substrings on either side of it. Finally, the behavior for the second substring is the same as the first: each character in "bar" has the same bounding box.
After hours of further investigation, I decided to get in touch with Apple Developer Tech Support. Sure enough, this is a bug! When
VNRecognizeTextRequest.recognitionLevel is set to
.accurate, as I had, the bug manifests. When
recognitionLevel is set to
.fast, the results behave as expected, with discrete bounding boxes per character.