VNRecognizeTextRequest from a text drawn in UIImageView returns empty results

Here is the setup.

I have an UIImageView in which I write some text, using UIGraphicsBeginImageContext.

I pass this image to the OCR func:

   func ocrText(onImage: UIImage?) {    
            
            let request = VNRecognizeTextRequest { request, error in
                guard let observations = request.results as? [VNRecognizedTextObservation] else {
                    fatalError("Received invalid observations")
                }
                print("observations", observations.count)
                for observation in observations {
                    if observation.topCandidates(1).isEmpty {   
                        continue
                    }

                }
            }       // end of request handler
            
            request.recognitionLanguages = ["fr"]
            let requests = [request]

            DispatchQueue.global(qos: .userInitiated).async {
                let ocrGroup = DispatchGroup()
                
                guard let img = onImage?.cgImage else { return }  // Conversion to cgImage works OK

                print("img", img, img.width)
                let (_, _) = onImage!.logImageSizeInKB(scale: 1)

                ocrGroup.enter()
                
                let handler = VNImageRequestHandler(cgImage: img, options: [:])
                try? handler.perform(requests)
                
                ocrGroup.leave()
                
                ocrGroup.wait() 

            }
    }

Problem is that observations is an empty array. I get the following logs:

img <CGImage 0x7fa53b350b60> (DP)
	<<CGColorSpace 0x6000032f1e00> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1)>
		width = 398, height = 164, bpc = 8, bpp = 32, row bytes = 1600 
		kCGImageAlphaPremultipliedFirst | kCGImageByteOrder32Little  | kCGImagePixelFormatPacked 
		is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes 398
ImageSize(KB): 5 ko
2022-06-02 17:21:03.734258+0200 App[6949:2718734] Metal API Validation Enabled
observations 0

Which shows image is loaded and converted correctly to cgImage. But no observations.

Now, if I use the same func on a snapshot image of the text drawn on screen, it works correctly.

Is there a difference between the image created by camera and image drawn in CGContext ?

Here is how mainImageView!.image (used in ocr) is created in a subclass of UIImageView:

    override func touchesEnded(_ touches: Set<UITouch>, with event: UIEvent?) {
        
        // Merge tempImageView into mainImageView
        UIGraphicsBeginImageContext(mainImageView!.frame.size)
        mainImageView!.image?.draw(in: CGRect(x: 0, y: 0, width: frame.size.width, height: frame.size.height), blendMode: .normal, alpha: 1.0)
        tempImageView!.image?.draw(in: CGRect(x: 0, y: 0, width: frame.size.width, height: frame.size.height), blendMode: .normal, alpha: opacity)
        mainImageView!.image = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()

        tempImageView?.image = nil
        
    }

I also draw the created image in a test UIImageView and get the correct image.

Here are the logs for the drawn texte and from the capture:

Drawing doesn't work
img <CGImage 0x7fb96b81a030> (DP)
	<<CGColorSpace 0x600003322160> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; sRGB IEC61966-2.1)>
		width = 398, height = 164, bpc = 8, bpp = 32, row bytes = 1600 
		kCGImageAlphaPremultipliedFirst | kCGImageByteOrder32Little  | kCGImagePixelFormatPacked 
		is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes 398
ImageSize(KB): 5 ko
2022-06-02 15:38:51.115476+0200 Numerare[5313:2653328] Metal API Validation Enabled
observations 0

Screen shot : Works
img <CGImage 0x7f97641720f0> (IP)
	<<CGColorSpace 0x60000394c960> (kCGColorSpaceICCBased; kCGColorSpaceModelRGB; iMac)>
		width = 570, height = 276, bpc = 8, bpp = 32, row bytes = 2280 
		kCGImageAlphaNoneSkipLast | 0 (default byte order)  | kCGImagePixelFormatPacked 
		is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes 570
ImageSize(KB): 5 ko
2022-06-02 15:43:32.158701+0200 Numerare[5402:2657059] Metal API Validation Enabled
2022-06-02 15:43:33.122941+0200 Numerare[5402:2657057] [WARNING] Resource not found for 'fr_FR'. Character language model will be disabled during language correction.
observations 1

Is there an issue with kCGColorSpaceModelRGB ?

Answered by Claude31 in 715577022

I finally found a way to get it.

I save the image to a file as jpeg and read the file back.

This didn't work with png, but works with jpeg.

Here is the simple code (in case someone has a better solution to propose):

            var formattedImage: UIImage?

            let imageData = imageView?.image!.jpegData(compressionQuality: 1.0)  // image is drawn in an imageView
            let fileManager = FileManager.default
            let paths = NSSearchPathForDirectoriesInDomains(
                FileManager.SearchPathDirectory.documentDirectory,
                FileManager.SearchPathDomainMask.userDomainMask, true)
            let documentsDirectory = paths[0] as NSString
            let fileExt = "TempJpegImage.jpg"
            let fileName = documentsDirectory.appendingPathComponent(fileExt) as String

            fileManager.createFile(atPath: fileName, contents: imageData, attributes: nil)  // Let's create a temp file

            if let imageJPEG = UIImage(contentsOfFile: fileName) {  // Read the image back
                formattedImage = imageJPEG
            }

Now, formattedImage is passed succesfully to ocrText

Some images are

<CGImage 0x7fb96b81a030> (DP)

Other

<CGImage 0x7fb96b81a030> (IP)

Does anyone know the difference between DP and IP ?

Found a partial answer here: https://github.com/SDWebImage/SDWebImage/issues/3330

[DP] represent Data Provider (CGDataProvider)and [IP] represent Image Provider (CGImageProvider)

Which could mean it works with an image provider, not with a data provider.

So the question would now be: how to set the cgImage with an image provider instead of data provider ?

EDITED

If I save UIImage to the photo album:

        UIImageWriteToSavedPhotosAlbum(image, nil, nil, nil)

and read back image from the album, it works.

I can now sum up the question:

  • starting with an UIImage

  • If I save to photo library and read back with picker

  • it works

  • How can I achieve this image format change (meta data ?) directly, without going through the photo library.

Should I create a temporary album, with a single photo, read programmatically from this album and delete it once done ?

Accepted Answer

I finally found a way to get it.

I save the image to a file as jpeg and read the file back.

This didn't work with png, but works with jpeg.

Here is the simple code (in case someone has a better solution to propose):

            var formattedImage: UIImage?

            let imageData = imageView?.image!.jpegData(compressionQuality: 1.0)  // image is drawn in an imageView
            let fileManager = FileManager.default
            let paths = NSSearchPathForDirectoriesInDomains(
                FileManager.SearchPathDirectory.documentDirectory,
                FileManager.SearchPathDomainMask.userDomainMask, true)
            let documentsDirectory = paths[0] as NSString
            let fileExt = "TempJpegImage.jpg"
            let fileName = documentsDirectory.appendingPathComponent(fileExt) as String

            fileManager.createFile(atPath: fileName, contents: imageData, attributes: nil)  // Let's create a temp file

            if let imageJPEG = UIImage(contentsOfFile: fileName) {  // Read the image back
                formattedImage = imageJPEG
            }

Now, formattedImage is passed succesfully to ocrText

Hey Claude,

I finally found a way to get it. I save the image to a file as jpeg and read the file back.

It's not clear to me why this should be necessary, at a minimum, I think you should file a bug report for this issue.

I'm also not 100% clear on what the source of your onImage is that you are passing off to Vision. If you file a Technical Support Incident for this issue and provide a focused sample project that reproduces, I can take a look at your image pipeline to see there is an issue there.

I did file a TSI a few hours ago (Case ID: 801319708). I will file a bug report as well.

What is really surprising is that passing a bitmap image created in the UIImageView with UIGraphicsBeginImageContext does not work…

For those who this may help, I got a quick answer from DTS (thanks @gchiste):

The critical difference here is that the jpeg loses the alpha channel and fills the zero alpha areas with white, whereas the png and the bitmap preserve the alpha channel.

As a workaround, you can fill your context with a background (preferably a background that contrasts your text color), and then the bitmap will recognize the same as the jpeg.

For example, a modified implementation:

    override func touchesEnded(_ touches: Set<UITouch>, with event: UIEvent?) {
        
        if !swiped {
            // draw a single point
            drawLineFrom(fromPoint: lastPoint, toPoint: lastPoint)
        }

        // Merge tempImageView into mainImageView
        UIGraphicsBeginImageContext(frame.size) 
        
        let context = UIGraphicsGetCurrentContext()!
        
        context.setFillColor(red: 1, green: 1, blue: 1, alpha: 1) // Or use the view backgroundcolor
        context.fill(.init(origin: .zero, size: frame.size))
        
        mainImageView!.image?.draw(in: bounds, blendMode: .normal, alpha: 1.0) 
        tempImageView!.image?.draw(in: bounds, blendMode: .normal, alpha: 1.0)
        mainImageView!.image = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()

        tempImageView?.image = nil
        
    }

The problem was effectively solved… until I did first test on iOS 16 simulator with Xcode 14ß.

Here, recognition is very poor. And recognition rate of some single letters (an L for instance) is zero (literally).

Did something change on iOS 16 ? I filed a bug report: Jun 7, 2022 at 3:28 PM – FB10066541

I was skeptical and dismissed this thread which was a mistake. This is not a bug, the vision framework needs a background to detect something. In my case I wanted to check drawn data from PencilKit to be analyzed with Vision.

I was able to get really good results with this small workaround to add a background to the drawn content:

private func getImage() -> UIImage {
        let image = canvas.drawing.image(from: canvas.bounds, scale: 1)    //  use your image here
        
        if let data = image.jpegData(compressionQuality: 1), let imageWithBackground = UIImage(data: data) {
            return imageWithBackground
        }
        return image
    }

Its not necessary to save it to files for this.

The recognition works without a problem on iOS 16 in my case.

VNRecognizeTextRequest from a text drawn in UIImageView returns empty results
 
 
Q