Does it make sense that the extrinsic translation is just 2 centimetres from the device anchor?
translation = 0.024845406, -0.02110077, -0.057464134
From measuring the camera location on VisionPro, it's definitely more than 2 cm from the "center" of VisionPro