Unconstrained input image height and width for conv net in coreML

I am trying to port a fully convolutional network to coreML. In tensorflow, it can take image with any width and height as input. However, with coreML model defined in the specification of FeatureTypes.proto, both the ImageFeatureType and ArrayFeatureType specify the input dimension explicitly. Then the compiler of coreML in xcode will translate it to something like MultiArray<Double, 3, 128, 128>, where 128 is the image width and height, which is undesired.


In tensorflow, I would specify the input shape as (None, None, 3) in python. Is there a way to permit any image as input for coreML? How should the protobuf message looks like? Such a feature will be very useful for style transfer and image segmentation.


Thanks in advance.

Replies

I don't know how to do this with Core ML but keep in mind that a mobile device has limited resources. Running an FCN over a 10-megapixel image, for example, is bound to be very slow.


You may want to consider using a fixed-size input image, of 512x512 pixels or so, and then use an upsampling stage at the end to **** the result up to your original image size.


(By the way, when Core ML does not support a particular feature, you can always drop down a few levels and use MPS or Metal to implement your network.)

I think running FCN could be common in many apps, consider the fact that user's image/ video can have any aspect ratio. I will definitly resize the image first, thanks for your reminder, so running time not be a big problem for FCN. I think it is quite resonable to implement this feature in coreML compiler, or has it been implemented already?

Core ML does not resize the image for you. If you use Core ML through the Vision framework then Vision handles resizing, but if you use Core ML directly you'll have to resize the image yourself.

I agree.

We do need the ablity to support different input size. this feature is very useful.

For one of our model in production, we support different input size to adapt to different devcie category.

with current coreml implementation, we have to ship multiple models for different input size. otherwise the coreml runtime would compline the size un-match.

Actually, we just don't want to change the ratio of height and width of the input image, to better detect the object, also run it on multi sacle but only load once the network.

So unconstrained size of the input is quiet important for us!

Besides, I suppose it's not very difficult to support it in CoreML?

I really wonder, that the face detection in Vision, which is also based on CoreML, seems to detect the face on multi scale ( 5 ) according to the blog of apple , here https://machinelearning.apple.com/2017/11/16/face-detection.html

HOW Apple managed to do this with fixed input size ?

I guss Apple used multi core of CPU/GPU and each core load the model with the same weights but defined different input size ????????????