Hello,
I am working on an app that scans documents and recognizes the text with help of Vision framework. This works great.
I would also like to "recognize" or detect individual images which are part of the document. Does Vision has any support for this or should I be looking into training my own ML model?
Below is an example document - I would like to extract the text (already done) and also the image of the building.