Core ML Async API Seems to Not Work Properly

I'm experiencing issues with the Core ML Async API, as it doesn't seem to be working correctly. It consistently hangs during the "03 performInference, after get smallInput, before prediction" part, as shown in the attached:

Below is my code. Could you please advise on how I should modify it?
private func createFrameAsync(for sampleBuffer: CMSampleBuffer ) {
    guard let pixelBuffer = sampleBuffer.imageBuffer else { return }

    Task {
        print("**** createFrameAsync before performInference")
        
        do {
            try await runModelAsync(on: pixelBuffer)
        } catch {
            print("Error processing frame: \(error)")
        }
        
        print("**** createFrameAsync after  performInference")
    }
}

func runModelAsync(on pixelbuffer: CVPixelBuffer) async
{
    print("01 performInference, before resizeFrame")
    
    guard let data = metalResizeFrame(sourcePixelFrame: pixelbuffer, targetSize: MTLSize.init(width: InputWidth, height: InputHeight, depth: 1), resizeMode: .scaleToFill) else {
        os_log("Preprocessing failed", type: .error)
        return
    }
    
    print("02 performInference, after  resizeFrame, before get smallInput")
    
    let input = model_smallInput(input: data)
    
    print("03 performInference, after  get smallInput, before prediction")
    
    if let prediction = try? await mlmodel!.model.prediction(from: input) {
        
        print("04 performInference, after  prediction, before get result")
        
        var results: [Float] = []
        let output = prediction.featureValue(for: "output")?.multiArrayValue
        if let bufferPointer = try? UnsafeBufferPointer<Float>(output!) {
            results = Array(bufferPointer)
        }
        
        print("05 performInference, after  get result, before setRenderData")

        let localResults = results
        await MainActor.run {
            ScreenRecorder.shared
                .setRenderDataNormalized(
                    screenImage: pixelbuffer,
                    depthData: localResults
                )
        }
        
        print("06 performInference, after  setRenderData")
    }
}

I update a version that runs without crash. But the prediction speed is almost the same as sync version API. The createFrameAsync is called from ScreenCaptureKit stream.

    private func createFrameAsync(for sampleBuffer: CMSampleBuffer ) {
        if let surface = getIOSurface(for: sampleBuffer) {
            Task {
                do {
                    try await runModelAsync(surface)
                } catch {
                    os_log("error: \(error)")
                }
            }
        }
    }

    func runModelAsync(_ surface: IOSurface) async throws {
        try Task.checkCancellation()
        guard let model = mlmodel else {return}

        do {
            // Resize input
            var px: Unmanaged<CVPixelBuffer>?
            let status = CVPixelBufferCreateWithIOSurface(kCFAllocatorDefault, surface, nil, &px)
            guard status == kCVReturnSuccess, let px2 = px?.takeRetainedValue() else { return }
            guard let data = resizeIOSurfaceIntoPixelBuffer(
                of: px2,
                from: CGRect(x: 0, y: 0, width: InputWidth, height: InputHeight)
            ) else { return }

            // Model Prediction
            var results: [Float] = []
            let inferenceStartTime = Date()
            let input = model_smallInput(input: data)
            let prediction = try await model.model.prediction(from: input)

            // Get result into format
            if let output = prediction.featureValue(for: "output")?.multiArrayValue {
                if let bufferPointer = try? UnsafeBufferPointer<Float>(output) {
                    results = Array(bufferPointer)
                }
            }

            // Set Render Data for Metal Rendering
            await ScreenRecorder.shared
                .setRenderDataNormalized(surface: surface, depthData: results)
        } catch {
            print("Error performing inference: \(error)")
        }
    }

Since Async prediction API cannot speed up the prediction, is there anything else I can do? The prediction time is almost the same on macbook M2 Pro and macbook M1 Air!

Core ML Async API Seems to Not Work Properly
 
 
Q