I want to swap some images in ARKit scene, but detectionImages wouldn't work as the solution needs to be real-life size agnostic.
Another alternative I see is Vision framework, though I couldn't build a plane for the detected bounding box. Experiments showed that ARKit wasn't able to detect a surface if the camera is not far above the image and I couldn't connect the detected surface to the image.
Is there any way to overcome this?
If by "real-life size agnostic" you mean that your target in real-life may have varying size, then one solution that I can think of is to use Vision to get the bounding box, and then use the position/size of that bounding box to hit-test for feature points in your session. This would give you a rough estimate of the real size of your target image, which can then be added to your detectionImages.