Core ML model execution sometimes fails under load

I'm processing a 4K video with a complex Core Image pipeline that also invokes a neural style transfer Core ML model. This works very well, but sometimes, for very few frames, the model execution fails with the following error messages:

Execution of the command buffer was aborted due to an error during execution. Internal Error (0000000e:Internal Error)
Error: command buffer exited with error status.
    The Metal Performance Shaders operations encoded on it may not have completed.
    Error: 
    (null)
    Internal Error (0000000e:Internal Error)
    <CaptureMTLCommandBuffer: 0x280b95d90> -> <AGXG15FamilyCommandBuffer: 0x108f143c0>
    label = <none> 
    device = <AGXG15Device: 0x106034e00>
        name = Apple A16 GPU 
    commandQueue = <AGXG15FamilyCommandQueue: 0x1206cee40>
        label = <none> 
        device = <AGXG15Device: 0x106034e00>
            name = Apple A16 GPU 
    retainedReferences = 1
[espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "Generic error": Internal Error (0000000e:Internal Error); code=1 status=-1
[coreml] Error computing NN outputs -1
[coreml] Failure in -executePlan:error:.

It's really hard to reproduce it since it only happens occasionally. I also didn't find a way to access that Internal Error mentioned, so I don't know the real reason why it fails.

Any advice would be appreciated!

Hello, this seems like a genuine bug in CoreML / MetalPerformanceShaders. Do you mind filing a bug report on http://feedbackassistant.apple.com/ with a sysdiagnose from the device after reproducing the issue + the error message you posted above?

Core ML model execution sometimes fails under load
 
 
Q