Same Model, Wildly Different Results per Platform

Question

I have a model that I have converted after training in Keras. It takes a four-channel image and returns a 2-channel image of the same size (which is processed to create a mask, should a partiicular class of object be detected). The model is similar to the 2014 "Fully Convolutional Networks for Semantic Segmentation." CoreML sees the input picture and output as MLMultiArrays, and I can load test data into these.

The model works fine in Keras on Linux or Mac. I set aside an image for testing on various platforms, with a known "good" output.

Converting from Keras to CoreML succeeds, and I have run the resultant CoreML model in coremltools with the reference image. It delivers the correct result image, matching the results from Keras/TF. I added this .mlmodel resource to a test framework in XCode.

Using the same reference imagery, not only does it generate incorrect values on iOS, I get wildly different values depending on whether I am running on different iOS devices -- a simulator or on my iPad Pro (recent 10.5", aka A1701).

Are there operations in CoreML that are known to cause issues when running in iOS but that succeed in conversion and when running on Mac? This one is a real puzzler, since the static data and code are the same between these two different iOS runs, and have already been vetted in the Mac using the same mlmodel resource.

I might write an additional Swift-based test on the Mac I suppose, but if CoreMLTools says the resource is okay, I don't think that would get me any closer to running my model under iOS. Am I missing something, some special modality?

Core ML

3.2k

Posted by

bjorke

Reply

Add a Comment

Answer 1

One typical difference between running the model in Python and running it on iOS is that images loaded into Python often have pixels in the range 0 - 1, while on iOS the pixels are in the range 0 - 255. If this is the case you may need to write preprocessing to the model.

Also see: www . machinethink . net /blog/help-core-ml-gives-wrong-output/

Posted by

kerfuffle

Add a Comment

Answer 2

Thanks! Yes, that's true, but I don't think this is the case for my issue.

I'm loading the MLMultiArrays from a CGImage, rather than having iOS load them, because I have a four-channel image and so CoreML insists that the input be MLMultiArray rather than a standard image type. I do the preprocessing at the loading stage. So if I print out the values from my Python/CoreML version:

IOS Test image reshaped to (1, 1, 4, 224, 224) for CoreML
  input array ch 0 range -104.0510 to 150.9490 (255.000000)
  input array ch 1 range -112.5145 to 136.4855 (249.000000)
  input array ch 2 range -116.6760 to 119.3240 (236.000000)
  input array ch 3 range -95.6667 to 113.3333 (209.000000)

And compare to the values loaded into the MLMultiArray in the iOS Swift app (which always has a batch size of ONE, so the shape is simplified):

Input shape: [4, 224, 224]
input ch 0 range -104.0510 to 150.9490 (255.000000)
input ch 1 range -112.5145 to 136.4855 (249.000000)
input ch 2 range -116.6760 to 119.3240 (236.000000)
input ch 3 range -95.6667 to 113.3333 (209.000000)

I get this same Swift log info from both the simulator and the hardware device, but very different outputs after calling model.prediction() ! A puzzle...

Posted by

bjorke

Add a Comment

Answer 3

Just to be 100% sure that you're really passing the exact same input to both models: when you reshape the image to (1, 1, 4, 224, 224) in Python, does the color channel really come _before_ the height and width dimensions in the data? Normally when you load an image in Python the dimensions are height x width x channels, not channels x height x width. I just want to verify that you're also transposing and not just reshaping.

The simulator giving different results than a hardware device could be because the simulator uses the CPU, which may use 64-bit floats or 32-bit floats to perform the computations, whereas the device uses the GPU which has 16-bit floats. Try using it on the device by setting the usesCPUOnly flag. Does it still give different results to the simulator then?

Posted by

kerfuffle

Add a Comment

Answer 4

Yes, specifically in my Python code I call

image4Core = np.transpose(image, (2,0,1))[None,None,:,:,:]

to get the reshaped data, and then pass "image4Core" to the model loaded by coremltools, which generates a result identical to the one made by Keras and TF using the un-transposed image.

The device and the simulator now agree between the two of them, but they both return the (same) variant result. I am worried a bit out FP16... I know that the guys over at FAIR were having pretty bad precision problems as well, and have said they have to force their models to lightweight floats as early as possible.

I see a `useCPUOnly` flag in coreML tools, is there one in iOS too? Haven't found it (yet)

Thanks again!

Posted by

bjorke

Add a Comment

Answer 5

See the MLPredictionOptions class.

Posted by

kerfuffle

Add a Comment

Answer 6

All sorted! Thanks. Thanks for the ptr re: MLPredictionOptions, not included in the generated boilerplate Swift "...Input" class and very useful for testing.

Posted by

bjorke

Add a Comment

Answer 7

Hey bjorke,

Did you fix your problem by just running the model on CPU only, or did you find the root cause?

Posted by

FrankSchlegel

Add a Comment

Answer 8

Did you fix your problem??? I also have a simlilar problem. I hope I can have an email chat with you.

Posted by

guanbinhuang

Add a Comment

Answer 9

you can use "request.usesCPUOnly = true"
the context code is listed here:
let request = VNCoreMLRequest(model: model!) { request, error in

      let results = request.results as! [VNCoreMLFeatureValueObservation]
      let featureValue = results.first?.featureValue
      let m = featureValue!.multiArrayValue
      print(m)


    }
//    request.imageCropAndScaleOption = .scaleFill
    request.usesCPUOnly = true
    let handler = VNImageRequestHandler(ciImage: ciImage1)
    try? handler.perform([request])

    }

Posted by

guanbinhuang

Add a Comment

Same Model, Wildly Different Results per Platform

Replies