Detect images in document scans with Vision?

Hello,

I am working on an app that scans documents and recognizes the text with help of Vision framework. This works great.

I would also like to "recognize" or detect individual images which are part of the document. Does Vision has any support for this or should I be looking into training my own ML model?

Below is an example document - I would like to extract the text (already done) and also the image of the building.

Hello,

Vision does not have an api that does this exactly, first I recommend that you file an enhancement request for an api with this functionality using Feedback Assistant.

Having said that, depending on how uniform your input documents are, you can probably cobble together a solution that works using VNDetectRectanglesRequest along with some heuristics to find the sub-rects of your documents that contain an image.

Thanks @gchiste! I will look into the VNDetectRectanglesRequest and file the enhancement request.

Detect images in document scans with Vision?
 
 
Q