Vision + RealityKit: Convert a point in ARFrame.capturedImage to 3D World Transform

Question

Background: I am prototyping with RealityKit with ios 14.1 on a latest iPad Pro 11 inches. My goal was to track a hand. When using skeleton tracking, it appears skeleton scales were not adjusted correctly so I got like 15cm off in some of my samples. So I am experimenting to use Vision to identity hand and then project back into 3D space.

1> Run image recognition on ARFrame.capturedImage

Code Block let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .up, options: [:])
let handPoseRequest = VNDetectHumanHandPoseRequest()
....
try handler.perform([handPoseRequest])

2> Convert point to 3D world transform (where the problem is).

Code Block swift   fileprivate func convertVNPointTo3D(_ point: VNRecognizedPoint,
                    _ session: ARSession,
                    _ frame: ARFrame,
                    _ viewSize: CGSize)
              -> Transform?
  {
    let pointX = (point.x / Double(frame.camera.imageResolution.width))*Double(viewSize.width)
    let pointY = (point.y / Double(frame.camera.imageResolution.height))*Double(viewSize.height)
    let query = frame.raycastQuery(from: CGPoint(x: pointX, y: pointY), allowing: .estimatedPlane, alignment: .any)
    let results = session.raycast(query)
    if let first = results.first {
      return Transform(matrix: first.worldTransform)
    }
    else {
      return nil
    }
  }

I wonder if I am doing the right conversion. The issue is, in the ARSession.rayCast document, it says this is converting UI screen point to 3D point. However, I am not sure how ARFrame.capturedImage will be fit into UI screen.

Thanks

1.3k

Posted by

MERELYQUESTIONS

Reply

Hey! Have you managed to implement hand recognition in space? Can you share a sample code?

—
RanguraN

Add a Comment

Answer 1

It appears that if I use .right orientation and I am using iPad in portrait way, the image and the detected points aligns without conversion (points detected with vision are the points used for raycast.

Code Block swiftlet handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .right, options: [:])

Posted by

MERELYQUESTIONS

Add a Comment

Answer 2

It appears that if I use .right orientation and I am using iPad in portrait way, the image and the detected points aligns without conversion (points detected with vision are the points used for raycast.

Code Block swiftlet handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .right, options: [:])

Posted by

MERELYQUESTIONS

Add a Comment

Vision + RealityKit: Convert a point in ARFrame.capturedImage to 3D World Transform

Accepted Reply

Replies