what is the DeviceAnchor? Is it the location from which virtual content (and camera passthrough) is rendered on the user's display(s)?
My understanding of a 4 by 4 transformation matrix is that the last column represents the translation, while the first 3 columns represent the local x, y, and z axis. I have plotted the first three columns of the extrinsics matrix and they do not seem to be orthogonal to each other, what are the implications of that? I would expect just a translation and a rotation (without scaling/shearing)
what is the coordinate system for these intrinsics? based on the first 3 columns, it seems to be x=right, y=down and z=forward. Is that correct?
any help or link to well-written documentation very much appreciated. When I see the following documentation which only tells me the type of extrinsics, and literally nothing else than just the type, I find that insufficient.
https://developer.apple.com/documentation/arkit/cameraframe/sample/parameters/4443449-extrinsics
Post
Replies
Boosts
Views
Activity
my apologies, they ARE orthogonal to each other, I made a sign mistake in my plotting. Still, I'd be curious to know the coordinate system. Thank you
Thanks a lot for all those clarifications! I see at least two use cases in which understanding the camera extrinsics is crucial:
An object is tracked with a non-generic algorithm which allows for a much higher tracking accuracy for the specific use case than any other out of the box tracking solution. A pose is computed relative to the camera, where is the object in world space?
A gridded sheet is tracked using ARKit image tracking for a high-feature texture in its center. the user can color each cell of the grid with a set of distinct colors which the system should interpret. Given a 3d coordinate in world space, which pixel area is corresponding in the camera frame?
We are now using the WorldTrackingProvider's queryDeviceAnchor with the current timestamp CACurrentMediaTime(). Is multiplying that with the camera extrinsics the correct approach to get the extrinsics in world space?
For debugging purposes we are now drawing the captured frames onto a canvas which we position at one meter in front of the camera location (as described above) with pixel density 1/focal length, together with a small sphere at the center of the canvas and a tube going from the camera to the center of the canvas. It looks like the "left camera" really is the right camera (from the user's perspective), is that correct?
when rendered for the right eye, the tube seems to to be pointing perfectly forward, slightly from the top left of the display. Does this mean that the whole scene is rendered from the camera's position? If not, what does it mean?
Unfortunately we have still not been able to display the tracked object at the correct pose in world space; there is a consistent offset which is very similar to the offset between the passthrough and the rendered frame on the canvas. Thanks a lot for your assistance so far, that is very much appreciated! I will try to test all the assumptions in a minimal project which we could share if that helps, I will keep you posted here if I make any progress.
I stand corrected about the left camera being the right one from the user's perspective. That conclusion was made because you said the extrinsics are in a coordinate system in which the x axis goes towards the user's right, and the extrinsics seem to have a translation with an x component of about 2.5 cm, which would mean that the camera is to the right.
After testing putting my finger on the actual physical cameras, I saw that it is indeed the left camera. So naturally I asked myself what I am doing wrong when interpreting the extrinsics?
Well it turns out: the extrinsics do not define the transformation from the device anchor to the camera, but from the camera to the device anchor. I had to invert the matrix, everything works now.