Combining facial landmarks from ios vision framework with depth images

I am capturing depth images with an iphone truedepth camera and using the ios vision framework to find face landmarks in the image. The capture device resolution is 3088x2136 and the depth map is 640x480. We are trying to find the depth of all of the face landmark points but I cannot scale the landmarks down to match the depth map dimensions correctly.

This is the code I am currently using:
Code Block
var landmarks = [
lastFaceObservation.landmarks?.leftEye,
lastFaceObservation.landmarks?.rightEye,
lastFaceObservation.landmarks?.nose,
lastFaceObservation.landmarks?.noseCrest,
lastFaceObservation.landmarks?.medianLine,
lastFaceObservation.landmarks?.faceContour
]
var landmarkNames = [
"leftEye",
"rightEye",
"nose",
"noseCrest",
"medianLine",
"faceContour"
]
var data = ""
let frameSize = CGSize(width: 640, height: 480)
for (index, landmark) in landmarks.enumerated() {
for (pointIndex, point) in landmark!.normalizedPoints.enumerated() {
var vectorPoint: simd_float2 = simd_float2(Float(point.x), Float(point.y))
var pixel: CGPoint = VNImagePointForFaceLandmarkPoint(vectorPoint, lastFaceObservation.boundingBox, Int(captureDeviceResolution.width), Int(captureDeviceResolution.height))
let transform = CGAffineTransform(scaleX: 640 / captureDeviceResolution.width, y: 480 / captureDeviceResolution.height)
pixel = pixel.applying(transform)
var pixelX = pixel.x
var pixelY = pixel.y
let Z = depthPointer[Int(Float(pixelY) * Float(CGFloat(width)) + Float(pixelX))]
let X = (Float(pixelX) - principalPointX) * Z / focalX
let Y = (Float(pixelY) - principalPointY) * Z / focalY
data.append("\(landmarkNames[index]), \(pointIndex), \(X), \(Y), \(Z)\n")
}
}

It is somehow not as simple as taking a landmark point on the full-size image, and scaling it down into a 640x480 image because this. When I run this code and plot the resulting depth cloud and landmark points the two aren't the same

I have tried scaling the pixel values down at various points but that didn't work. I've also tried removing the principal point offset but without success. I think there is something wrong with the affine transform but I'm not sure how to correct it. I would expect to see the landmark points lining up correctly with the depth cloud
Combining facial landmarks from ios vision framework with depth images
 
 
Q