About 3D coordinate system of cameraCalibrationData.extrinsicMatrix

I would like to ask about cameraCalibrationData.extrinsicMatrix translation 3x1 column vector data.

I am using iPhone X(iOS11.4.1). In dual camera dual photo delivery mode, wide-area photo's extrinsicMatrix translation 3x1 column vector data indicates [-13.3001, -0.00406908, 0.0237064]. So it means that wide-area camera(lower position)' position is translated to [-13.3001, -0.00406908, 0.0237064] compared to the reference camera(telephoto camera(upper position)). So, introduced 3D coordinate system's X axis direction will be from bottom to top of the iPhone device (Y axis: left to right, Z axis: front to back).

Is it correct?

Replies

I am misunderstanding positional relation between telephoto camera and wide-angle camera.

Telephoto camera is located in lower position, and widw-angle camera is located in upper position.

So, 3D coordinate system's X axis direction will be from top to bottom of the iPhone device. If the right-handed coordinate system is supported, maybe {Y axis: right to left, Z axis: front to back} or {Y axis: left to right, Z axis: back to front}. I'd like to know which is correct.

I'm seeing Apple Developer's documantation (the title is "Understanding World Tracking in ARKit").

It includes following description.


"In all AR experiences, ARKit uses world and camera coordinate systems following a right-handed convention: the y-axis points upward, and (when relevant) the z-axis points toward the viewer and the x-axis points toward the viewer's right."


It is very curious that the direction of the x-axis in ARKit does not match to the wide-area photo's extrinsicMatrix (in dual camera dual photo delivery mode) in AVFoundation.

I'll explain in detail. The device I'm using is iPhoneX which two camera (telephoto camera and wide-angle camera) is embedded. Telephoto camera is located in lower position, and wide-angle camera is located in upper position. I confirmed that in dual camera dual photo delivery mode, wide-area photo's extrinsicMatrix translation 3x1 column vector data indicates [-13.3001, -0.00406908, 0.0237064]. It means that in x-direction the wide-angle camera is tarnslated to -13.3001(mm) compared to the reference camera(telephoto camera). So the x-axis must be the viewer's downward direction. It is different to ARKit.

Is it true?

I don't think you can deduce that conclusion. What is the rest of the transform? Identity rotation, or 90-degree rotation?


The statement about the right-handed rule isn't really saying about which direction the X-axis actually points. It's only saying that if you hold the phone with its screen facing you, and if you hold it with the X-axis (of whatever coordinate system you're asking about) pointing to your right, BI the Y-axis will be pointing up towards the sky (not down at the ground), and the Z-axis will point out of the screen towards your face (not away from you in the direction you're looking). It's about relative positions of the axes, not absolute positions.

Following is the contents of telephoto camera(reference camera) and wide-angle camera extrinsic matrix.


telephoto camera(reference camera) extrinsic matrix

[[1.0, 0.0, 0.0)], [0.0, 1.0, 0.0)], [0.0, 0.0, 1.0)], [0.0, 0.0, 0.0)]]


wide-angle camera extrinsic matrix

[[0.999999, 0.0003352, 0.00140538)], [-0.000322289, 0.999958, -0.00917725)], [-0.0014084, 0.00917679, 0.999957)],

[-13.3001, -0.00406908, 0.0237064)]]


Wide-angle camera's rotation matrix is almost identity matrix because the both cameras are embedded in a same device.

Of course the right-handed rule means relative position rule of each axis. I do not conclude a certain way of thinking as I mentioned in previous posts.

So the wide angle camera has a very slight distortion applied by its 3x3 matrix. It's possible that the translation vector is accounting for the resulting mis-alignment of the image (relative to the telephoto camera), rather than accounting for the distance between cameras.

Q

I’d like to organize the important points about my question, because I’m misunderstanding about definition of the extrinsic matrix.


The definition of the extrinsic matrix is the transformation from the world coordinate system to the camera coordinate system. So, a point coordinate vector value on the world coordinate system is transformed to the point coordinate vector value on the camera coordinate system by applying extrinsic matrix.


2. With iPhoneX(iOS11.4.1), in dual camera dual photo delivery mode,


telephoto camera(reference camera) extrinsic matrix

[[1.0, 0.0, 0.0)], [0.0, 1.0, 0.0)], [0.0, 0.0, 1.0)], [0.0, 0.0, 0.0)]]


wide-angle camera extrinsic matrix

[[0.999999, 0.0003352, 0.00140538)], [-0.000322289, 0.999958, -0.00917725)], [-0.0014084, 0.00917679, 0.999957)],

[-13.3001, -0.00406908, 0.0237064)]]


Telephoto camera is reference camera. It is a specification of iOS.

So, Telephoto camera’s extrinsic matrix is [identity rotation matrix + origin point vector] (it means telephoto camera’s coordinate system is identical to the World Coordinate system).

On the other hand, what does the wide-angle camera’s extrinsic matrix content indicates?

The wide-angle camera coordinate system’s origin point vector is [13.3001, 0.00406908, -0.0237064] on the world coordinate system. (In other words, the world coordinate system’s origin point vector is [-13.3001, -0.00406908, 0.0237064] on the wide-angle camera coordinate system.)

So, the wide-angle camera (which locates above telephoto camera in iPhoneX) coordinate system’s x-axis direction is almost device bottom to top.


And from here I enter the speculation region. Maybe right-handed convention is introduced. And maybe Z-axis takes the direction of device front to back or back to front. So that means camera coordinate system will be {x-axis: device bottom to top, y-axis: device left to right, z-axis: device front to back} or {x-axis: device bottom to top, y-axis: device right to left, z-axis: device back to front}.

Furthermore, this image data’s orientation value of Exif information is 6(counterclockwise 90 degree). So the maximum likelihood camera coordinate system must be {x-axis: device bottom to top, y-axis: device left to right, z-axis: device front to back}.


3. My question is what camera coordinate system is used in AVCapturePhoto’s cameraCalibrationData?.extrinsicMatrix? I’d like to know if the documentation exist.