I'm building an app and one of the requirements is being able to get a somewhat accurate estimate for a person's height. Getting within an inch (maybe two) is fine but a delta greater than that and it won't work.
I'm using ARBodyTrackingConfiguration to get the detected ARAnchor/ARSkeleton and I'm seeing this come in to the session delegate. To calculate the height, I've tried two methods:
Take the jointModelTransforms for the right_toes_joint and the head_joint and find the difference in the y coordinates.
Build a bounding box by throwing the jointModelTransforms of all the joints in the skeleton into it and then finding the difference in y coordinate of the min/max of the bounding box.
To account for the distances between my head and my crown, I'm taking the distance from the neck_3_joint (neck) to the head_joint and adding this to my values from either method 1) or 2). Why this particular calculation? Because this should roughly account for the missing height according to the way artists draw faces.
Both methods yield the same value (good) but I'm seeing my height come through at 1.71 meters or 5'6" (bad since I'm 6'0").
I know there's a estimatedScaleFactor that is potentially supposed to be used to correct from some discrepancies but this value always comes in at < 1 which means applying it will only make my final height calculation smaller.
I know what I'm trying to do should be possible because Apple's own Measure app can do this on my iPhone 14 Pro. This leaves two possibilities (or maybe another?):
I'm doing something wrong
Apple's Measure App has access to something I don't
Here's the code I'm using that demonstrates method 1. There's enough of method 2 in here as well that you should be able to see what I'm trying in that case.
func session(_ session: ARSession, didUpdate anchors: [ARAnchor]) {
for anchor in anchors {
guard let bodyAnchor = anchor as? ARBodyAnchor
else { return }
let skeleton = bodyAnchor.skeleton
var bodyBoundingBox = BoundingBox()
for (i, joint) in skeleton.definition.jointNames.enumerated() {
bodyBoundingBox = bodyBoundingBox.union(SIMD3(x: skeleton.jointModelTransforms[i].columns.3.x, y: skeleton.jointModelTransforms[i].columns.3.y, z: skeleton.jointModelTransforms[i].columns.3.z))
}
// Get key joints
// [10] right_toes_joint
// [51] head_joint
// [48] neck_2_joint
// [49] neck_3_joint
// [50] neck_4_joint
let toesJointPos = skeleton.jointModelTransforms[10].columns.3.y
let headJointPos = skeleton.jointModelTransforms[51].columns.3.y
let neckJointPos = skeleton.jointModelTransforms[49].columns.3.y
// Get some intermediate values
let intermediateHeight = headJointPos - toesJointPos
let headToCrown = headJointPos - neckJointPos
// Final height. Scale by bodyAnchor.estimatedScaleFactor?
let realHeight = intermediateHeight + headToCrown
Post
Replies
Boosts
Views
Activity
I'm currently building an iOS app that requires the ability to detect a person's height with a live video stream. The new VNDetectHumanBodyPose3DRequest is exactly what I need but the observations I'm getting back are very inconsistent and unreliable. When I say inconsistent, I mean the values never seem to settle and they can fluctuate anywhere from 5 '4" to 10'1" (I'm about 6'0"). In terms of unreliable, I have once seen a value that closely matches my height but I rarely see any values that are close enough (within an inch) of the ground truth.
In terms of my code, I'm not doing any fancy. I'm first opening a LiDAR stream on my iPhone Pro 14:
guard let videoDevice = AVCaptureDevice.default(.builtInLiDARDepthCamera, for: .video, position: .back) else { return }
guard let videoDeviceInput = try? AVCaptureDeviceInput(device: videoDevice) else { return }
guard captureSession.canAddInput(videoDeviceInput) else { return }
captureSession.addInput(videoDeviceInput)
I'm then creating an output synchronizer so I can get both image and depth data at the same time:
videoDataOutput = AVCaptureVideoDataOutput()
captureSession.addOutput(videoDataOutput)
depthDataOutput = AVCaptureDepthDataOutput()
depthDataOutput.isFilteringEnabled = true
captureSession.addOutput(depthDataOutput)
outputVideoSync = AVCaptureDataOutputSynchronizer(dataOutputs: [depthDataOutput, videoDataOutput])
Finally, my delegate function that handles the synchronizer is roughly:
fileprivate func perform3DPoseRequest(cmSampleBuffer: CMSampleBuffer, depthData: AVDepthData) {
let imageRequestHandler = VNImageRequestHandler(cmSampleBuffer: cmSampleBuffer, depthData: depthData, orientation: .up)
let request = VNDetectHumanBodyPose3DRequest()
do {
// Perform the body pose request.
try imageRequestHandler.perform([request])
if let observation = request.results?.first {
if (observation.heightEstimation == .measured) {
print("Body height (ft) \(formatter.string(fromMeters: Double(observation.bodyHeight))) (m): \(observation.bodyHeight)")
...
I'd appreciate any help determining how to get accurate results from the observation's bodyHeight. Thanks!