It is logical that scene understanding is missing in the visionOS simulator. Scene understanding requires sensor data about the physical environment. The visionOS simulator without physical sensors cannot have sensor data.
Additionally, it should be noted that AVP includes cameras and LiDAR, but the sensor data is not shared with developers for privacy reasons. What is made available to developers is horizontal and vertical plane information and mesh information (ARMeshAnchor) that ARKit generates by internally processing sensor data.
Post
Replies
Boosts
Views
Activity
ARKit in visionOS provides information (ARPlaneAnchor) about the position, orientation and size of a real horizontal or vertical plane.
https://developer.apple.com/documentation/arkit/arplaneanchor
In addition, LiDAR 3D measurement information (ARDepthData) is processed to provide mesh information (ARMeshAnchor).
https://developer.apple.com/documentation/arkit/armeshanchor
So, in theory, app developers can render 2D graphics with desired photos and videos on ARPlaneAnchor and ARMeshAnchor.
In addition, through the analysis of ARDepthData or ARMeshAnchor, various AR applications are possible by accurately determining the shape, size, position and direction of real curved surfaces in real time.
YouTube BmKNmZCiMkw
YouTube 9QkSPkLIfWU
We have developed a software library (FindSurface SDK) that processes 3D measurement points to accurately estimate the shape, size, position and orientation of the workpiece in real time.
The accuracy of the software is the highest according to ISO 10360-6 (accuracy better than 1 micrometer for the length unit and 1 microradian for the angle unit in a volume of 1 m^3), and the operating speed is more than 50 objects/sec. on iPhone 14 Pro (Max) with LiDAR 3D camera.
A series of 3D measurement points can be a point cloud (or a series of vertices in a mesh). Recognizable and measurable shapes are planes, spheres, cylinders, cones and tori.
For the working principle and demo of FindSurface, please search on web and find the following link:
GitHub CurvSurf/FindSurface
YouTube CurvSurf
FindSurface web demo.
FindSurface may not be what you search for.
If you have a question, please contact the email address listed on GitHub CurvSurf/FindSurface.
If the shape, size, location, and orientation of the surface of a real object can be accurately estimated by processing vertices of ARMeshAnchor in real time, various AR applications are possible. Accurate real-time calculations ensure that such real objects can serve as moving anchors for virtual objects.
Furthermore, if depth values for 576 points of LiDAR are provided, AR at another level is possible. Even in totally dark environments where the RGB images required for motion tracking cannot be obtained, however 576 points of LiDAR are obtained. By processing LiDAR's 576 such points in real time, you can pinpoint the shape, size, location, and orientation of a real-world object's surface front of you, even in the total darkness. 576 points of LiDAR must have no concern about privacy problem. But bring much positive uses.
By processing LiDAR's 576 points in real time, you can determine the shape, size, location, and orientation of a real-world object's surface front of you, even in the total darkness.
Not only the objects but also you are allowed to move.
The ARPlaneAnchor and ARMeshAnchor (plus Scene understanding) spatial information will be the only secondary information provided even after the physical primary sensor information is processed internally by visionOS. Currently, no primary sensor information is provided in the visionOS 'simulator'. Although the primary physical sensors, the RGB camera and the LiDAR 3D camera, are installed in the commercially available Vision Pro, only ARPlaneAnchor and ARMeshAnchor appear to be made available to developers via visionOS ARKit to protect personal data. It seems that information such as RGB stream, LiDAR depth map, facial recognition and human body contours are not provided. There is absolutely no reason why Apple would allow the development of apps that allow users to attach Vision Pro to their heads and secretly alter other people's faces and bodies.
Night vision enables the visually impaired to see. In other words, it provides information about the shape, size, position and posture of objects in front of a visually impaired person. FindSurface can provide real-time information about this.
What is clear is that if you can solve a problem with software, you can always save on hardware costs. Even if the data quality is low, if the desired results can be achieved through software, inexpensive hardware should be applied. That is why the role of mathematics stands out.
Here is an example of processing 3D face point cloud data (2019) by CurvSurf FindSurface.
Nose extraction - iPhone X, SR300
https://youtu.be/eCmVYl3GIEY
The FindSurface SDK software library has a binary file size of about 300 KB for Windows 64-bit DLL or 1 MB (due to the module name texts) for iOS/iPadOS/macOS/Linux/Android framework. It is a really tiny little runtime library that contains many algorithms for region growing, segmentation, model fitting (orthogonal distance fitting), optimization, geometry, curvature analysis, statistics and probability theory without any commercial libraries or ML/DL algorithms. It is a simple container with optimized abstract mathematical formulas and strategies.
Apple's current privacy policy doesn't allow this. This is especially true given the failure of Google Glass in 2013. Imagine the person in front of you wearing Vision Pro, recognizing your face, tracking your body movements, and having fun creating virtual, comical transformations of your face and body. Also imagine that it tracks your ID and analyzes your body movement-related health status in real time. Your private living room can be scanned and misused somewhere at some point.
An item (model, or virtual object) has shape, size, position, and orientation. There is the world reference coordinate system. And, your device has moving device coordinate system. A real moving/stationary object has own model (shape, size, position, and orientation) coordinate system. And finally, You like to attach your item onto/around a real object surface inside your device screen. It's a real time AR example. Really difficult unsolved problem yet.
It's a hard problem. But, there is already a solution.
First, you created/designed an item (model, or virtual object) with shape, size on the model coordinate system.
Then, the world reference coordinate system was set as you ran your App. The device's coordinate system referenced to the world reference coordinate system is determined in real time by motion tracking of ARKit.
What information is lacked at this moment?
What unknown is the shape, size, position, and orientation of the real object surface of your interest (e.g., your real object plane).
Our FindSurface runtime library solves your last problem:
FindSurface Web Demo
https://developers.curvsurf.com/WebDemo/
Virtual ads inside Chungmuro station Line 3 - iPhone Pro 12
https://youtu.be/BmKNmZCiMkw
You can use the set of vertices of ARMeshAnchor as a point cloud.
https://developer.apple.com/documentation/arkit/armeshgeometry/3516924-vertices
Then, there will be methods for determining the bounding box of the point cloud.
The normal vector of the mesh is fundamentally sensitive to measurement errors. Therefore, if the measurement error is large, the spatial range for calculating the average normal vector at the tap location must be increased.
Additionally, real-life objects have sizes. The size can be approximately expressed as the radius of surface curvature. For example, the radius of a cylinder. Therefore, the spatial extent for calculating the normal vector at the tap location must be increased proportional to the radius of curvature of the object surface. The radius of curvature of a plane is very large.
Currently, it appears to take a normal vector of one vertex at the tap location. This is quite unstable information.
In summary,
The normal vector calculation range near the tap location must be expanded in proportion to the measurement error.
The normal vector calculation range must be expanded in proportion to the radius of curvature of the target object surface at the tap location.
In summary, one solution may be to take the average value of the normal vector of several vertices near the tap location.
App developers must implement it themselves. More preferably, Apple, which has information about measurement errors, should provide a solution.
The ray casting implemented in a CurvSurf FindSurface demo app is as follows.
The basic information needed for ray casting is:
Ray origin
Ray direction
Ray casting target domain.
In the CurvSurf FindSurface demo app:
Ray origin: Current position of device
Ray direction: Center of device screen (has eventually 6-DOF)
Ray casting target domain: 3D measurement point (point cloud, or vertex points of mesh).
pickPoint() sets up a viewing cone with the ray direction as its axis, and selects the point closest to the device among the points inside the viewing cone. If there are no point inside the viewing cone, the point closest to the viewing cone is selected.
https://github.com/CurvSurf/FindSurface-GUIDemo-iOS/blob/main/ARKitFindSurfaceDemo/ViewController.swift
https://github.com/CurvSurf/FindSurface-GUIDemo-iOS/blob/main/ARKitFindSurfaceDemo/Helper.swift
let cameraTransform = camera.transform // Right-Handed
let rayDirection = -simd_make_float3( cameraTransform.columns.2 )
let rayOrigin = simd_make_float3( cameraTransform.columns.3 )
There is definitely no alternative.
camera.transform is the most accurate real-time info about camera's 6DOF under ARKit.