Combining ARKit Face Tracking with High-Resolution AVCapture and Perspective Rendering on Front Camera

Subject: Combining ARKit Face Tracking with High-Resolution AVCapture and Perspective Rendering on Front Camera

Message:

Hello Apple Developer Community,

We’re developing an application using the front camera that requires both real-time ARKit face tracking/guidance and the capture of high-resolution still images via AVCaptureSession. Our goal is to leverage ARKit’s depth and face data to render a captured image from another perspective post-capture, maintaining high image quality.

Our Approach:

  1. Real-Time ARKit Guidance:
  • Utilize ARKit (e.g., ARFaceTrackingConfiguration) for continuous face tracking, depth, and scene understanding to guide the user in real time.
  1. High-Resolution Capture Transition:
  • At the moment of capture, we plan to pause the ARKit session and switch to an AVCaptureSession to take a high-resolution image.
  • We assume that for a front-facing image, the subject’s face is directly front-on, and the relative pose between the face and camera remains the same during the transition. The only variation we expect is a change in distance.
  • Our intention is to minimize the delay between the last ARKit frame and the high-res capture to maintain temporal consistency, assuming that aside from distance, the face-camera relative pose remains unchanged.
  1. Post-Processing Perspective Rendering:
  • Using the last ARKit face data (depth, pose, and landmarks) along with the high-resolution 2D image, we aim to render the scene from another perspective.
  • We want to correct the perspective of the 2D image using SceneKit or RealityKit, leveraging the collected ARKit scene information to achieve a natural, high-quality rendering from a different viewpoint.
  • The rendering should match the quality of a normally captured high-resolution image, adjusting for the difference in distance while using the stored ARKit data to correct perspective.

Our Questions:

  1. Session Transition Best Practices:
  • What are the recommended best practices to seamlessly pause ARKit and switch to a high-resolution AVCapture session on the front camera
  • How can we minimize user movement or other issues during this brief transition, given our assumption that the face-camera pose remains largely consistent except for distance changes?
  1. Data Integration for Perspective Rendering:
  • How can we effectively integrate stored ARKit face, depth, and pose data with the high-res image to perform accurate perspective correction or rendering from another viewpoint?
  • Given that we assume the relative pose is constant except for distance, are there strategies or APIs to leverage this assumption for simplifying the perspective transformation?
  1. Perspective Correction with SceneKit/RealityKit:

What techniques or workflows using SceneKit or RealityKit are recommended for correcting the perspective of a captured 2D image based on ARKit scene data? How can we use these frameworks to render the high-resolution image from an alternative perspective, while maintaining image quality and fidelity? 4. Pitfalls and Guidelines:

  • What common pitfalls should we be aware of when combining ARKit tracking data with high-res capture and post-processing for perspective rendering?
  • Are there performance considerations, recommended thresholds for acceptable temporal consistency, or validation techniques to ensure the ARKit data remains applicable at the moment of high-res capture?

We appreciate any advice, sample code references, or documentation pointers that could assist us in implementing this workflow effectively.

Thank you!

Hi @ONIO,

Have you tried the ARKit session api to capture a high resolution image:

https://developer.apple.com/documentation/arkit/arsession/3975720-capturehighresolutionframe

This allows you to grab a high resolution image while keeping the ARKit session running to continue to provide guidance. There is a WWDC video from 2022 here that explains more:

https://developer.apple.com/videos/play/wwdc2022/10126

Hi @Vision Pro Engineer , Thank you for the links. I indeed tried captureHighResolutionFrame(completion:) on my iPhone 14 Pro with iOS 18.1.1 and was able to get a 1512 × 2016 frame instead of the standard 1080 × 1440. I tested it with the Tracking and Visualizing Faces sample app. Unfortunately requires our use case at least the 7MP (2316 × 3088) from the front camera. Is this (1512 × 2016) actually the highest resolution frame I can get with my setup and using ARKit? Or do I need to pay attention other configuration settings?

In the video everything is about ARWorldTrackingConfiguration. Does it also apply to ARFaceTrackingConfiguration? I asking because I was not able to get a higher resolution stream. The following returned me nil

    // Assign the video format that supports hi-res capturing.
config.videoFormat = hiResCaptureVideoFormat
}
// Run the session.
session.run(config)

Since also features like triggering focus events and other device settings could be beneficial, I tried to access the device as described in the video as well with:

   do {
      try device.lockForConfiguration()
      // configure AVCaptureDevice settings
      …
      device.unlockForConfiguration()
   } catch {
      // error handling
      …
   }
}

But I was not able to access it. Should it be possible?

I investigated into a fast session switching, but was not able to get it faster than 1.6 seconds which brakes the user experience heavily. Below you can find the code that I used to switch sessions and capture an image.

Since we only need the face orientation and face landmarks of ARKit, we looked into other methods to get this. We found the Vision Framework. Are there other options?

How would you detect a head pose with the back camera?

Thank you in advance, it is really a hot topic on our side.

Combining ARKit Face Tracking with High-Resolution AVCapture and Perspective Rendering on Front Camera
 
 
Q