VNRecognizeTextRequest input image/pixel buffer manipulation

Question

Created Jun ’20

Replies 1

Boosts 0

Participants 2

Hello,

I'm wondering if there's any image manipulation done internally by the Vision framework itself, before the text recognition is performed?

I'm asking because I know there's a CIDocumentEnhancer filter that I could apply to the image before passing it over to the VNRecognizeTextRequest. However, I would like to avoid it, if something similar is already done internally.

If no manipulation is done internally, would you recommend performing something like below?

Code Block let capturedImage = CIImage(cvPixelBuffer: pixelBuffer)
let filter = CIFilter(name: "CIDocumentEnhancer")
filter?.setValue(capturedImage, forKey: kCIInputImageKey)
filter?.setValue(5, forKey: kCIInputAmountKey)
let filteredImage = filter?.outputImage

Also, based on my testing, the text recognition request consistently took ~10% longer, when provided with the filtered image vs. the original captured image. Any ideas why?

Looking forward to hearing your thoughts. Thanks!

Answered by Frameworks Engineer in 618340022

The VNRecognizeTextRequest does not do any preprocessing. Based on your knowledge of what you are trying to read you can significantly enhance the results through preprocessing the image using CoreImage like contrast enhancement or doing perspective correction when used together with the rectangle detector.

Boost

Answer 1

Frameworks Engineer OP

Apple

Jul ’20

Accepted Answer

The VNRecognizeTextRequest does not do any preprocessing. Based on your knowledge of what you are trying to read you can significantly enhance the results through preprocessing the image using CoreImage like contrast enhancement or doing perspective correction when used together with the rectangle detector.

1