To create signatures for human faces and compare the similarities, I'm using ARKit's captureImage
from ARFrame
which is derived from the front facing camera with ARFaceTrackingConfiguration.
However, compared to using the Vision and AVFoundation frameworks, the quality of the signature analysis is significantly impacted by the captureImage's
low resolution. The resolution of the capturedImage
in ARKit is just 640x480, according to capturedDepthData
even though the video format is set to the highest resolution.
let configuration = ARFaceTrackingConfiguration()
if let videoFormat = ARFaceTrackingConfiguration.supportedVideoFormats.sorted(by: { ($0.imageResolution.width * $0.imageResolution.height) < ($1.imageResolution.width * $1.imageResolution.height) }).last {
configuration.videoFormat = videoFormat
}
I tried using captureHighResolutionFrame
and as well as change the video format:
if let videoFormat = ARFaceTrackingConfiguration.recommendedVideoFormatForHighResolutionFrameCapturing {
configuration.videoFormat = videoFormat
}
However, according to the documentation:
The system delivers a high-resolution frame out-of-band, which means that it doesn't affect the other frames that the session receives at a regular interval
The asynchronous nature of taking the high resolution images seems to result alternating between the standard captured image and the high resolution images rather than replacing the regular captured images. This is a concern because, depending on the size variations, displayTransform
and CGAffineTransform
must be used in different ways to scale the images.
Not only that I need to able to use the frames continuously at either 30 fps or 60 fps, as they're produced rather than taking pictures occasionally, which the captureHighResolutionFrame
method seems to be designed for considering the shutter sound.
In order to use the captured image, I'm currently transforming it in the following way
let image: CIImage = CIImage(cvImageBuffer: imageBuffer)
let imageSize: CGSize = CGSize(width: CVPixelBufferGetWidth(imageBuffer), height: CVPixelBufferGetHeight(imageBuffer))
let normalizeTransform: CGAffineTransform = CGAffineTransform(scaleX: 1.0 / imageSize.width, y: 1.0 / imageSize.height)
let flipTransform: CGAffineTransform = metadata.orientation.isPortrait ? CGAffineTransform(scaleX: -1, y: -1).translatedBy(x: -1, y: -1) : .identity
guard let viewPort: CGRect = face.viewPort else { return nil }
let viewPortSize: CGSize = viewPort.size
guard let displayTransform: CGAffineTransform = face.arFrame?.displayTransform(for: metadata.orientation, viewportSize: CGSize(width: viewPortSize.width, height: viewPortSize.height)) else {
return nil
}
let scaleX: CGFloat = viewPortSize.width
let scaleY: CGFloat = viewPortSize.height
let viewPortTransform: CGAffineTransform = CGAffineTransform(scaleX: scaleX, y: scaleY)
let scaledImage: CIImage = image
.transformed(by: normalizeTransform
.concatenating(flipTransform)
.concatenating(displayTransform)
.concatenating(viewPortTransform)
)
.cropped(to: viewPort)