Translate ARKit face anchor to camera coordinates

My Swift ARKit app needs the position and orientation of the face relative to the front-facing camera. If I set ARConfiguration.worldAlignment = .camera all I need to do is call for the faceAnchor.transform, which works perfectly; but my app needs to run in the default worldAlignment = .gravity so it can work properly with virtual content. In this mode I can get faceAnchor.transform and camera.transform, which are both supplied in world coordinates. How can I use those transforms or other data/methods to get the face anchor in camera coordinates? (Specifically, I need a transform that comes out the same as the direct result from worldAlignment = .camera.) I've tried multiplying those together as well as multiplying one by the other's inverse, in all four order combinations, but none of these results works. I think I am missing something basic. Help?!?!

Replies

I was finally able to figure this out using SceneKit functions:


        let currentFaceTransform = currentFaceAnchor!.transform
        
        let currentCameraTransform = frame.camera.transform
        
        let newFaceMatrix = SCNMatrix4.init(currentFaceTransform)
        
        let newCameraMatrix = SCNMatrix4.init(currentCameraTransform)
        let cameraNode = SCNNode()
        cameraNode.transform = newCameraMatrix

        let originNode = SCNNode()
        originNode.transform = SCNMatrix4Identity
        
        //Converts a transform from the node’s local coordinate space to that of another node.
        let transformInCameraSpace = originNode.convertTransform(newFaceMatrix, to: cameraNode)
        
        let faceTransformFromCamera = simd_float4x4(transformInCameraSpace)


I hope this helps some others out there!

I'm new to Swift plz have patience.

Can you explain in detail how you get the coordinates and rotation values of the faceAnchor? I'm able to access the faceAnchor.transform for each value but that is a 4x4 matrix. What do those numbers mean? The (documentation on ARFaceAnchor) just vaguely says that it describes the "face’s current position and orientation in world coordinates".

I get that everything is relative and the coordinates are in relation to some defined coordinate system, but what ARE the points ya know?

In my head, it only requires 6 values to define an object in 3D space: x, y, z, and a vector of 3 values to explain the rotation. I also understand that there is gimbal lock which is probably why there are more than 6 values.

So what exactly are the values in the faceAnchor.transform and how do I extract the x, y, z and rotation?

Thank you.
Hello Tallmiin,

First, I recommend that you do some research around the topic of "4x4 transform matrices", there is lots of information available about this topic.

In general, to get the x, y, and z position from a 4x4 transform, you need to look at the first three elements of the last column in the matrix. The coordinate space this transform is in determines what that position actually represents.

To get the "right" vector (i.e. the vector that points along the positive x-axis of the local coordinate space of the object) you need the first three elements of the first column in the matrix.

The "up" vector (i.e. the vector that points along the positive y-axis of the local coordinate space of the object) is the first three elements of the second column in the matrix.

The "forward" vector (i.e. the vector that points along the positive z-axis of the local coordinate space of the object) is the first three elements of the third column in the matrix.