Vision + RealityKit: Convert a point in ARFrame.capturedImage to 3D World Transform

Background: I am prototyping with RealityKit with ios 14.1 on a latest iPad Pro 11 inches. My goal was to track a hand. When using skeleton tracking, it appears skeleton scales were not adjusted correctly so I got like 15cm off in some of my samples. So I am experimenting to use Vision to identity hand and then project back into 3D space.

1> Run image recognition on ARFrame.capturedImage
Code Block
let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .up, options: [:])
let handPoseRequest = VNDetectHumanHandPoseRequest()
....
try handler.perform([handPoseRequest])


2> Convert point to 3D world transform (where the problem is).
Code Block swift
   fileprivate func convertVNPointTo3D(_ point: VNRecognizedPoint,
                    _ session: ARSession,
                    _ frame: ARFrame,
                    _ viewSize: CGSize)
              -> Transform?
  {
    let pointX = (point.x / Double(frame.camera.imageResolution.width))*Double(viewSize.width)
    let pointY = (point.y / Double(frame.camera.imageResolution.height))*Double(viewSize.height)
    let query = frame.raycastQuery(from: CGPoint(x: pointX, y: pointY), allowing: .estimatedPlane, alignment: .any)
    let results = session.raycast(query)
    if let first = results.first {
      return Transform(matrix: first.worldTransform)
    }
    else {
      return nil
    }
  }


I wonder if I am doing the right conversion. The issue is, in the ARSession.rayCast document, it says this is converting UI screen point to 3D point. However, I am not sure how ARFrame.capturedImage will be fit into UI screen.

Thanks

  • Hey! Have you managed to implement hand recognition in space? Can you share a sample code?

Add a Comment

Accepted Reply

It appears that if I use .right orientation and I am using iPad in portrait way, the image and the detected points aligns without conversion (points detected with vision are the points used for raycast.
Code Block swift
let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .right, options: [:])



Replies

It appears that if I use .right orientation and I am using iPad in portrait way, the image and the detected points aligns without conversion (points detected with vision are the points used for raycast.
Code Block swift
let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .right, options: [:])