DockKit tracking becomes erratic with increased zoom factor in iOS app

I'm developing an iOS app using DockKit to control a motorized stand. I've noticed that as the zoom factor of the AVCaptureDevice increases, the stand's movement becomes increasingly erratic up and down, almost like a pendulum motion. I'm not sure why this is happening or how to fix it.

Here's a simplified version of my tracking logic:

func trackObject(_ boundingBox: CGRect, _ dockAccessory: DockAccessory) async throws {
    guard let device = AVCaptureDevice.default(for: .video),
          let input = try? AVCaptureDeviceInput(device: device) else {
        fatalError("Camera not available")
    }
    
    let currentZoomFactor = device.videoZoomFactor
    let dimensions = device.activeFormat.formatDescription.dimensions
    let referenceDimensions = CGSize(width: CGFloat(dimensions.width), height: CGFloat(dimensions.height))
    
    let intrinsics = calculateIntrinsics(for: device, currentZoom: Double(currentZoomFactor))
    
    let deviceOrientation = UIDevice.current.orientation
    let cameraOrientation: DockAccessory.CameraOrientation = {
        switch deviceOrientation {
        case .landscapeLeft: return .landscapeLeft
        case .landscapeRight: return .landscapeRight
        case .portrait: return .portrait
        case .portraitUpsideDown: return .portraitUpsideDown
        default: return .unknown
        }
    }()
    
    let cameraInfo = DockAccessory.CameraInformation(
        captureDevice: input.device.deviceType,
        cameraPosition: input.device.position,
        orientation: cameraOrientation,
        cameraIntrinsics: useIntrinsics ? intrinsics : nil,
        referenceDimensions: referenceDimensions
    )
    
    let observation = DockAccessory.Observation(
        identifier: 0,
        type: .object,
        rect: boundingBox
    )
    let observations = [observation]
    
    try await dockAccessory.track(observations, cameraInformation: cameraInfo)
}

func calculateIntrinsics(for device: AVCaptureDevice, currentZoom: Double) -> matrix_float3x3 {
    let dimensions = CMVideoFormatDescriptionGetDimensions(device.activeFormat.formatDescription)
    let width = Float(dimensions.width)
    let height = Float(dimensions.height)
    
    let diagonalPixels = sqrt(width * width + height * height)
    let estimatedFocalLength = diagonalPixels * 0.8
    
    let fx = Float(estimatedFocalLength) * Float(currentZoom)
    let fy = fx
    let cx = width / 2.0
    let cy = height / 2.0
    
    return matrix_float3x3(
        SIMD3<Float>(fx, 0, cx),
        SIMD3<Float>(0, fy, cy),
        SIMD3<Float>(0, 0, 1)
    )
}

I'm calling this function regularly (10-30 times per second) with updated bounding box information. The erratic movement seems to worsen as the zoom factor increases.

Questions:

  1. Why might increasing the zoom factor cause this erratic movement?
  2. I'm currently calculating camera intrinsics based on the current zoom factor. Is this approach correct, or should I be doing something differently?
  3. Are there any other factors I should consider when using DockKit with a variable zoom?
  4. Could the frequency of calls to trackRider (10-30 times per second) be contributing to the erratic movement? If so, what would be an optimal frequency?

Any insights or suggestions would be greatly appreciated. Thanks!

Be sure to pass in the camera intrinsics. Rather than compute them yourself, pull them from the AVCaptureDevice.

I've seen something similar, when the zoom is at default, it's fine, as it increases, the fact that your view is zoomed in isn't known to the tracking system because incorrect intrinsics. So a small offset at low zoom because a big offset at bigger zoom and the system tells the accessory to rotate too much. Feedback loop.

These snippets might be of use to you:

if let captureConnection = videoDataOutput.connection(with: .video) {
            captureConnection.isEnabled = true
            captureConnection.isCameraIntrinsicMatrixDeliveryEnabled = true
}

[God almighty. Why is it so impossible to format code in this editor?]

This function pulls out the intrinsics and computes the field-of-view, but that was for something I was doing; just the intrinsics matrix here might be what you want:

nonisolated func computeFOV(_ sampleBuffer: CMSampleBuffer) -> Double? {
        guard let camData = CMGetAttachment(sampleBuffer, key:kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) as? \
Data else { return nil }
    let intrinsics: matrix_float3x3? = camData.withUnsafeBytes { pointer in
        if let baseAddress = pointer.baseAddress {
            return baseAddress.assumingMemoryBound(to: matrix_float3x3.self).pointee
        }
        return nil
    }

    guard let intrinsics = intrinsics else { return nil }

    let fx = intrinsics[0][0]
    let w = 2 * intrinsics[2][0]
    return Double(atan2(w, 2*fx))
}

Again, sorry for the totally ****** formatting. If someone can tell me how this is supposed to work, I'm all ears. I pasted code and hit "code block" but it didn't help much.

DockKit tracking becomes erratic with increased zoom factor in iOS app
 
 
Q