Improve body tracking of Vision VNDetectHumanBodyPoseRequest

Question

I am trying to improve the performance of drawing the skeleton with body tracking as I am getting noticeable lag even when further than 5 metres away, and with stable iPhone XS camera. The tracking is not close to the showcased performance in WWDC-10043 demo video.

I have also tried using:

Code Block let request = VNDetectHumanBodyPoseRequest(completionHandler: )
...

however the results were the same, and I also tried using revision 1 of the algorithm:

Code Block let request = VNDetectHumanBodyPoseRequest( )
request.revision = VNDetectHumanBodyPoseRequestRevision1
...

and this didn't help either.

Here's my current code:

Code Block /// Extracts poses from a frame.
class Predictor {
  func processFrame(_ samplebuffer: CMSampleBuffer) throws -> [VNRecognizedPointsObservation] {
    // Perform Vision body pose request
    let framePoses = extractPoses(from: samplebuffer)
    
    // Select the most promiment person.
    guard let pose = framePoses.first else {
      return []
    }
    
    return framePoses
  }
  
  func extractPoses(from sampleBuffer: CMSampleBuffer) -> [VNRecognizedPointsObservation] {
    let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: .down)
    
    let request = VNDetectHumanBodyPoseRequest()
    
    do {
      // Perform the body pose-detection request.
      try requestHandler.perform([request])
    } catch {
      print("Unable to perform the request: \(error).\n")
    }
    
    return bodyPoseHandler(request: request, error: nil)
  }
  func bodyPoseHandler(request: VNRequest, error: Error?) -> [VNRecognizedPointsObservation] {
    guard let observations =
            request.results as? [VNRecognizedPointsObservation] else {
      print("Empty observations.\n\n")
      return []
    }
    return observations
  }
}
class CameraViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    let observations = try? predictor.processFrame(sampleBuffer)
    observations?.forEach { processObservation($0) }
}
func processObservation(_ observation: VNRecognizedPointsObservation) {
    
    // Retrieve all torso points.
    guard let recognizedPoints =
            try? observation.recognizedPoints(forGroupKey: .all) else {
      return
    }
    
    let storedPoints = Dictionary(uniqueKeysWithValues: recognizedPoints.compactMap { (key, point) -> (String, CGPoint)? in
      return (key.rawValue, point.location)
    })
    
    DispatchQueue.main.sync {
      let mappedPoints = Dictionary(uniqueKeysWithValues: recognizedPoints.compactMap { (key, point) -> (String, CGPoint)? in
        guard point.confidence > 0.1 else { return nil }
        let norm = VNImagePointForNormalizedPoint(point.location,
                                                  Int(drawingView.bounds.width),
                                                  Int(drawingView.bounds.height))
        return (key.rawValue, norm)
      })
      
      let time = 1000 * observation.timeRange.start.seconds
      
      // Draw the points onscreen.
      DispatchQueue.main.async {
        self.drawingView.draw(points: mappedPoints)
      }
    }
  }
}

Thanks in advance, I hope you can help me out! :)

875

Posted by

LemonSpike

Reply

Add a Comment