How to specify that a CoreML model should be deployed in half precision?

When converting from Caffe to CoreML models using coremltools, there is unfortunately no option to specify whether it should run in half or full precision.


I would like to explicitly make use of the fp16 ALU's on the GPU's of e.g. A8/A9 during inference.


Is there a way to do this? Or does it happen magically anyway, optimizing the ALU utlization of the underlying hardware?

Replies

I found a similar discussion of the data type conversion for inputs and outputs (to float32) here: https://forums.developer.apple.com/thread/84401

However, after converting my cnn to float32 there was no impact on processing speed. i am going to try float16 later.

Core ML will already use half floats during inference when running on the GPU. You can see this for yourself by using GPU Frame Capture while a Core ML model is doing its predictions. The Metal Performance Shaders CNN kernels that it uses (mostly) only run on float16 anyway.