VNRecognizeTextRequest input image/pixel buffer manipulation

Hello,

I'm wondering if there's any image manipulation done internally by the Vision framework itself, before the text recognition is performed?

I'm asking because I know there's a CIDocumentEnhancer filter that I could apply to the image before passing it over to the VNRecognizeTextRequest. However, I would like to avoid it, if something similar is already done internally.

If no manipulation is done internally, would you recommend performing something like below?

Code Block
let capturedImage = CIImage(cvPixelBuffer: pixelBuffer)
let filter = CIFilter(name: "CIDocumentEnhancer")
filter?.setValue(capturedImage, forKey: kCIInputImageKey)
filter?.setValue(5, forKey: kCIInputAmountKey)
let filteredImage = filter?.outputImage


Also, based on my testing, the text recognition request consistently took ~10% longer, when provided with the filtered image vs. the original captured image. Any ideas why?

Looking forward to hearing your thoughts. Thanks!

Accepted Reply

The VNRecognizeTextRequest does not do any preprocessing. Based on your knowledge of what you are trying to read you can significantly enhance the results through preprocessing the image using CoreImage like contrast enhancement or doing perspective correction when used together with the rectangle detector.

Replies

The VNRecognizeTextRequest does not do any preprocessing. Based on your knowledge of what you are trying to read you can significantly enhance the results through preprocessing the image using CoreImage like contrast enhancement or doing perspective correction when used together with the rectangle detector.