If Apple Neural Engine can support the 8bits/integer inference? Quantized weights to 8bits could reduce storage to quarter, but the inference speed did not change?
Post
Replies
Boosts
Views
Activity
devices:
iphone 11
config:
configuration.computeUnits = .all
let myModel = try! myCoremlModel(configuration: configuration).model
Set the Range for Each Dimension:
input_shape= ct.Shape(shape=(1,3,ct.RangeDim(lower_bound=128, upper_bound=384, default=256),ct.RangeDim(lower_bound=128, upper_bound=384, default=256)))
inference time as table(average of 100 runs)
The default size inference for dynamic models is the same as for static models, but 128128 and 384384 hundreds of times slow than fixed-size models. Is this normal? Is there any good solution?
model init time is too long
load model time about 2 minutes, Is there a way to speed it up? For example, load from the cache? Can converted mlparkage speed up the loading time?