Hi everyone,
I'm working on an iOS app built in Swift using Xcode, where I'm integrating Roboflow's object detection API to extract items from grocery receipts. My goal is to identify key information (like items, total, tax, etc.) from the images of these receipts.
I'm successfully sending images to the Roboflow API and receiving predictions with bounding box data, but when I attempt to extract text from the detected regions (bounding boxes), it appears that the text extraction is failing—no text is being recognized. The issue seems to be that the bounding boxes are either not properly being handled or something is going wrong in the way I process the API response.
Here's a brief breakdown of what I'm doing:
The image is captured, converted to base64, and sent to the Roboflow API.
The API response comes back with bounding boxes for the detected elements (items, date, subtotal, etc.).
The problem occurs when I try to extract the text from the image using the bounding box data—it seems like the bounding boxes are being found, but no text is returned.
I suspect the issue might be happening because the app’s segue to the results view controller is triggered before the OCR extraction completes, or there might be a problem in my code handling the bounding box response.
Response Data:
{
"inference_id": "77134cce-91b5-4600-a59b-fab74350ca06",
"time": 0.09240847699993537,
"image": {
"width": 370,
"height": 502
},
"predictions": [
{
"x": 163.5,
"y": 250.5,
"width": 313.0,
"height": 127.0,
"confidence": 0.9357666373252869,
"class": "Item",
"class_id": 1,
"detection_id": "753341d5-07b6-42a1-8926-ecbc61128243"
},
{
"x": 52.5,
"y": 417.5,
"width": 89.0,
"height": 23.0,
"confidence": 0.8819760680198669,
"class": "Date",
"class_id": 0,
"detection_id": "b4681149-d538-47b1-8700-d9528bf1daa0"
},
...
]
}
And the log showing bounding boxes:
Prediction: ["width": 313, "y": 250.5, "x": 163.5, "detection_id": 753341d5-07b6-42a1-8926-ecbc61128243, "class": Item, "height": 127, "confidence": 0.9357666373252869, "class_id": 1]
No bounding box found in prediction.
I've double-checked the bounding box coordinates, and everything seems fine. Does anyone have experience with using OCR alongside object detection APIs in Swift? Any help on how to ensure the bounding boxes are properly processed and used for OCR would be greatly appreciated!
Also, would it help to delay the segue to the results view controller until OCR is complete?
Thank you!