I am trying to parse text from an image, split it into words and store the words in a String array. Additionally I want to store the bounding box of each recognized word.
My code works but for some reason the bounding boxes of words that are not separated by a space but by an apostrophe come out wrong.
Here is the complete code of my VNRecognizeTextRequestHander:
let request = VNRecognizeTextRequest { request, error in
guard let observations = request.results as? [VNRecognizedTextObservation] else {
return
}
// split recognized text into words and store each word with corresponding observation
let wordObservations = observations.flatMap { observation in
observation.topCandidates(1).first?.string.unicodeScalars
.split(whereSeparator: { CharacterSet.letters.inverted.contains($0) })
.map { (observation, $0) } ?? []
}
// store recognized words as strings
recognizedWords = wordObservations.map { (observation, word) in String(word) }
// calculate bounding box for each word
recognizedWordRects = wordObservations.map { (observation, word) in
guard let candidate = observation.topCandidates(1).first else { return .zero }
let stringRange = word.startIndex..<word.endIndex
guard let rect = try? candidate.boundingBox(for: stringRange)?.boundingBox else { return .zero }
let bottomLeftOriginRect = VNImageRectForNormalizedRect(rect, Int(captureRect.width), Int(captureRect.height))
// adjust coordinate system to start in top left corner
let topLeftOriginRect = CGRect(origin: CGPoint(x: bottomLeftOriginRect.minX,
y: captureRect.height - bottomLeftOriginRect.height - bottomLeftOriginRect.minY),
size: bottomLeftOriginRect.size)
print("BoundingBox for word '\(String(word))': \(topLeftOriginRect)")
return topLeftOriginRect
}
}
And here's an example for what's happening. When I'm processing the following image:
the code above produces the following output:
BoundingBox for word 'In': (23.00069557577264, 5.718113962610181, 45.89460636656961, 32.78087073878238)
BoundingBox for word 'un': (71.19064286904202, 6.289275587192936, 189.16024359557852, 34.392966621800475)
BoundingBox for word 'intervista': (71.19064286904202, 6.289275587192936, 189.16024359557852, 34.392966621800475)
BoundingBox for word 'del': (262.64622870703477, 8.558512219726875, 54.733978711037985, 32.79967358237818)
Notice how the bounding boxes of the words 'un' and 'intervista' are exactly the same. This happens consistently for words that are separated by an apostrophe. Why is that?
Thank you for any help
Elias