I wrote an app that saves out the RGB+D image of ARKit, as well as exports the FaceAnchor SCNView as a USD file for each image.
After converting the RGB+D to a 3D pointcloud and applying the camera transformation to it (so it would be at the right position/rotation/scale in relation to the USD camera), I noticed the depth for the pointcloud is a bit closer to the camera than the actual FaceAnchor mesh generated by ARKit.
The example below demonstrates the problem very well:
I expected the mouth to be to be aligned with the lips of the FaceAnchor mesh (and the depth teeth actually inside the FaceAnchor mesh mouth), but as you can see, is way closer to the camera than that.
By applying an offset of 0.03 (or 3cm considering the depth is in meters) to depth before applying the camera transform to it, I was able to align the pointcloud much better with the FaceAnchor mesh, as showed bellow:
But the problem is this 0.03 offset was just a trial and error until I got something that resembles what I expected with this specific photo, at this distance of the face to the device... Nothing guarantees it will work on any situation since it's not a proper mathematical solution to the problem.
I'm guessing ARKit generates the mesh a bit behind the actual depth for some reason, so in this case, how can I properly transform the depth so it aligns with the FaceAnchor mesh position?
Any insights would be greatly appreciated, but an official answer to the issue from someone from apple would be even better!
-H