High CPU usage with CoreImage vs Metal

I am processing CVPixelBuffers received from camera using both Metal and CoreImage, and comparing the performance. The only processing that is done is taking a source pixel buffer and applying crop & affine transforms, and saving the result to another pixel buffer. What I do notice is CPU usage is as high a 50% when using CoreImage and only 20% when using Metal. The profiler shows most of the time spent is in CIContext render:

 let cropRect = AVMakeRect(aspectRatio: CGSize(width: dstWidth, height: dstHeight), insideRect: srcImage.extent)
        
  var dstImage = srcImage.cropped(to: cropRect)

   let translationTransform = CGAffineTransform(translationX: -cropRect.minX, y: -cropRect.minY)
       
 var transform = CGAffineTransform.identity

   transform = transform.concatenating(CGAffineTransform(translationX: -(dstImage.extent.origin.x + dstImage.extent.width/2), y: -(dstImage.extent.origin.y + dstImage.extent.height/2)))
      
  transform = transform.concatenating(translationTransform)

   transform = transform.concatenating(CGAffineTransform(translationX: (dstImage.extent.origin.x + dstImage.extent.width/2), y: (dstImage.extent.origin.y + dstImage.extent.height/2)))
        
      dstImage = dstImage.transformed(by: translationTransform)
         
       let scale = max(dstWidth/(dstImage.extent.width), CGFloat(dstHeight/dstImage.extent.height))
        
        let scalingTransform = CGAffineTransform(scaleX: scale, y: scale)
        
        transform = CGAffineTransform.identity
     
        transform = transform.concatenating(scalingTransform)
      
        dstImage = dstImage.transformed(by: transform)
        
        if flipVertical {
            dstImage = dstImage.transformed(by: CGAffineTransform(scaleX: 1, y: -1))
            dstImage = dstImage.transformed(by: CGAffineTransform(translationX: 0, y: dstImage.extent.size.height))
        }
        
        if flipHorizontal {
            dstImage = dstImage.transformed(by: CGAffineTransform(scaleX: -1, y: 1))
            dstImage = dstImage.transformed(by: CGAffineTransform(translationX: dstImage.extent.size.width, y: 0))
        }

     var dstBounds = CGRect.zero
     
     dstBounds.size = dstImage.extent.size
       
    _ciContext.render(dstImage, to: dstPixelBuffer!, bounds: dstImage.extent, colorSpace: srcImage.colorSpace )

Here is how CIContext was created:

_ciContext = CIContext(mtlDevice: MTLCreateSystemDefaultDevice()!, options: [CIContextOption.cacheIntermediates: false])

I want to know if I am doing anything wrong and what could be done to lower CPU usage in CoreImage?

Every time you render a CIImage with a CIContext, CI does a filter graph analysis to determine the best path for rendering the image (determining intermediates, region of interest, kernel concatenation, etc.). This can be quite CPU-intensive.

If you only have a few simple operations to perform on your image, and you can easily implement them in Metal directly, you are probably better off using that.

However, I would also suggest you file Feedback with the Core Image team and report your findings. We also observe a very heavy CPU load in our apps, caused by Core Image. Maybe they find a way to further optimize the graph analysis – especially for consecutive render calls with the same instructions.

Can someone from Apple please respond to this?

I adopted Core Image instead of Metal as it was recommended by Core Image team in various WWDC videos such as wwdc2020-10008 but only to realize that my UI is getting slower due to very high CPU usage by Core Image (as much as 3x-4x compared to raw Metal). It seems I need to rewrite the entire filter chain using plain Metal unless there is a workaround for high CPU usage. I have submitted a feedback request to DTS with a fully reproducible sample code.

Really looking forward to answers by Core Image team here.

Dear Core Image Engineering Team,

I wasted a DTS credit to get this answered and all I was advised was to debug using CIPrintTree and look for "optimisations" where there is none possible (the code I submitted with the bug report contained minimal CIImage rendering pipeline). The DTS engineer acknowledged the significant difference in CPU usage between Metal and Core Image pipeline but was unable to offer anything except recommending filing a bug report. I have already filed FB12619176 and looking to hear directly from Core Image Engineering team. Because high CPU usage is a no-no for using Core Image for video pipelines. It might be okay for photo editing but not for live video or video editing tasks.

High CPU usage with CoreImage vs Metal
 
 
Q