Why i enabled Metal API in `encode` function but my Coreml custom layer still run on CPU

Question

Created Jan ’22

Replies 1

Boosts 0

Participants 2

I implement a custom pytorch layer on both CPU and GPU following [Hollemans amazing blog] (https://machinethink.net/blog/coreml-custom-layers ). The cpu version works good, but when i implemented this op on GPU it cannot activate "encode" function. Always run on CPU. I have checked the coremltools.convert() options with compute_units=coremltools.ComputeUnit.CPU_AND_GPU, but it still not work. This problem also mentioned in https://stackoverflow.com/questions/51019600/why-i-enabled-metal-api-but-my-coreml-custom-layer-still-run-on-cpu and https://developer.apple.com/forums/thread/695640. Any idea on help this would be grateful.

System Information

mac OS: 11.6.1 Big Sur
xcode: 12.5.1
coremltools: 5.1.0
test device: iphone 11

Boost

Answer 1

csw0pe OP

Jun ’23

I have also been having issues getting the encodeToCommandBuffer function to be called.

One thing that I had to do was make sure the input was big enough. When running a custom layer on an image with (1, 8, 32, 32) shape for example, the CPU implementation was called. When I scaled that up to (1, 96, 256, 256), for example, it caused the GPU function to be called. There is some heuristic inside CoreML that looks at tensor size when determining to run a custom layer on the CPU or the GPU. I'm not sure if it looks at the size of outputs, inputs, or what - but it's doing something like that. Likely dependent on the device you're running on as well - a debug/verbose mode would be really nice to figure out how CoreML is arriving to these decisions.

I also noticed that the input tensor had to be somewhat 'image shaped'. When attempting to pass (1, 500_000, 1, 1) through a custom layer, it was executed on the CPU. When passing (1, 50, 100, 100) through (same size as previous shape), it was executed on the GPU. I guess this has to do with the encodeToCommandBuffer function accepting MTLTexture objects as inputs/outputs: maybe there is some limit on the number of channels, etc.

0