Hey there, I am currently building an app which requires accurate text detection, including properly detecting text that is split up into multiple rows, as well as properly grouping paragraphs. The Live Text feature in the Camera app does exactly that, but will this also be added to the Vision framework, or is there already a way for doing this?
Live Text recognition in Vision framework
Hey! Did you find a good answer to this. Looking into something similar.
Hi! It seems like nothing has been added on this front. It's odd. In the debugger I can see that observation objects have a property _isTitle
, which seems to accurately determine if a piece of text is a title on a page, at least using the images I tried. It's seems to be a private property but its presence gestures towards more functionality that Apple could provide.
It seems to me like we're stuck trying to do formatting work by hand. I made a project to grab text from an old book. I'm playing around with comparing lengths of the observation's boundingBox
(or frame
in the file I linked) to determine whether or not to add a newline and a tab to my output. It doesn't work very well at the moment. Still trying to think of other ways to duplicate paragraph structure that Mac Preview and the iOS Camera app support.
Hopefully this bumps the thread and keeps the subject alive. There are a handful of questions like this that have gotten a lot of views but no responses :(