VisionKit - get bounding boxes from ImageAnalysis

Question

Created Dec ’22

Replies 1

Boosts 1

Participants 3

I am developing a command line application to extract text from images and PDF files. The ImageAnalysis class from VisionKit provides high quality OCR but does not appear to have functionality to get the position of extracted text (words, etc.).

This functionality appears to be in place in a private unexposed API, since the ImageAnalysisOverlayView is able to leverage it to show the live text interface. Is there any way to get this information in a terminal application with no displayed UI? (Note: I filed a feedback request for this over 3 months ago and have yet to hear back)

Boost

Answer 1

bxroberts OP

Dec ’22

This functionality would be very useful to my work as well!

The rationale behind this is that ImageAnalysis seems to do the highest quality OCR, but it doesn't output positions. So in order to get the positions, I have to use VNRecognizeRequest, which seems to offer slightly lower quality OCR.

0