Background: I am prototyping with RealityKit with ios 14.1 on a latest iPad Pro 11 inches. My goal was to track a hand. When using skeleton tracking, it appears skeleton scales were not adjusted correctly so I got like 15cm off in some of my samples. So I am experimenting to use Vision to identity hand and then project back into 3D space.
1> Run image recognition on ARFrame.capturedImage
2> Convert point to 3D world transform (where the problem is).
I wonder if I am doing the right conversion. The issue is, in the ARSession.rayCast document, it says this is converting UI screen point to 3D point. However, I am not sure how ARFrame.capturedImage will be fit into UI screen.
Thanks
1> Run image recognition on ARFrame.capturedImage
Code Block let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .up, options: [:]) let handPoseRequest = VNDetectHumanHandPoseRequest() .... try handler.perform([handPoseRequest])
2> Convert point to 3D world transform (where the problem is).
Code Block swift fileprivate func convertVNPointTo3D(_ point: VNRecognizedPoint, _ session: ARSession, _ frame: ARFrame, _ viewSize: CGSize) -> Transform? { let pointX = (point.x / Double(frame.camera.imageResolution.width))*Double(viewSize.width) let pointY = (point.y / Double(frame.camera.imageResolution.height))*Double(viewSize.height) let query = frame.raycastQuery(from: CGPoint(x: pointX, y: pointY), allowing: .estimatedPlane, alignment: .any) let results = session.raycast(query) if let first = results.first { return Transform(matrix: first.worldTransform) } else { return nil } }
I wonder if I am doing the right conversion. The issue is, in the ARSession.rayCast document, it says this is converting UI screen point to 3D point. However, I am not sure how ARFrame.capturedImage will be fit into UI screen.
Thanks
Hey! Have you managed to implement hand recognition in space? Can you share a sample code?