I'm building an ARKit (w/ SceneKit) experience that requires us to automatically place 3D anchors based on contents of a 2D overlay. To accomplish this, every N frames or so we will grab the relevant 2d screen coordinates and, construct the ARRaycastQuery manually (or use the convenience methods included in ARFrame/ARSCNView, same deal) and then cast the ray.
What I've found is that occasionally the query will return an empty result set for an area of the scene that, intuitively, should have results. This happens all throughout the lifecycle of the scene, not just in the beginning where the scene internals are building out an understanding of the environment. It also happens in portions of a single frame where other rays returned results, even when there is physically little to no difference between the contents + distance at the end of those rays. I most often observe this when targeting estimatedPlane, but have observed this with all plane types. All rays are being cast via ARSession.raycast()
Even more curiously, if I randomly generate something like 10 raycast queries that are all within a tight pixel radius around the original intended ray, I will get results for some of the rays and 0 for others, that, for all intents and purposes, are near identical. At present, the most reliable method is to run a loop for no more than N milliseconds where I pick a random point within a small rectangle centered around the original point and cast a ray, and then stop once I've found a result.
This seems highly inefficient. I've experimenting now with ARTrackedRaycast, but there are still cases where even though the raycast was created, it never calls back with any results.
My question is: What is happening under the hood that might produce these types of inconsistencies with the raycast, and are there any approaches that much more reliably/predictably produce actual results? The scattershot technique above can help, but seems inefficient and working against the engine rather than with it. Should we expect pixel-level differences in the engine's ability to cast a ray, even when the physical contents behind those pixels are the same?
Note: This happens on all devices, but is much more frequent and common on non-lidar enabled devices.