import coremltools as ct
from coremltools.models.neural_network import quantization_utils
# load full precision model
model_fp32 = ct.models.MLModel(modelPath)
model_fp16 = quantization_utils.quantize_weights(model_fp32, nbits=16)
model_fp16.save("reduced-model.mlmodel")
I'm testing it with the model from one of Apple's source codes(GameBoardDetector), and it works fine, reduces the model size by half. But there are several problems with my model(trained on CreateML app using Full Network):
- Quantizing to float 16 does not work(new file gets created with reduced only 0.1mb).
- Quantizing to below 16 values cause errors, and no file gets created.
Here are additional metadata and precisions of models.
Working model's additional metadata and precision:
Mine's additional metadata and precision: