Incorrect output from CoreML using instnce normalization

I would like to use instance normalization in a mlmodel. I wrote a custom conversion function in the keras converter in coremltools4.0. However, the instance normalization output is incorrect whenever it is preceded by a convolution layer. I have struggled with it for a whole day without clue, so it would be great if anyone can offer help.


The details:

To narrow down the problem, I handcraft some simple network.

[1st Test]

Make a simple single layer network with instance normalization, as follow:

specificationVersion: 1
description {
  input {
    name: "data"
    type {
      imageType {
        width: 8
        height: 8
        colorSpace: RGB
      }
    }
  }
  output {
    name: "output1"
    type {
      multiArrayType {
        shape: 3
        shape: 8
        shape: 8
        dataType: DOUBLE
      }
    }
  }
}
neuralNetwork {
  layers {
    name: "instancenormalization_1"
    input: "data"
    output: "output1"
    batchnorm {
      channels: 3
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 1e-05
      gamma {
        floatValue: 1.0
        floatValue: 0.0
        floatValue: 0.0
      }
      beta {
        floatValue: 0.0
        floatValue: 0.0
        floatValue: 0.0
      }
      mean {
        floatValue: 0
        floatValue: 0
        floatValue: 0
      }
      variance {
        floatValue: 1
        floatValue: 1
        floatValue: 1
      }
    }
  }
  preprocessing {
    scaler {
      channelScale: 1.0
    }
  }
}

Since gamma is [1 ,0, 0], it is expected that 0th channel output will be something from -2 to 2 (since it is normalized), and other 2 channels zero. Running it on ios11 beta2 gives me the same answer as I calculate by hand. So there is no problem.


[2nd Test]

Make a 2 layers network with [1x1 convolution, instance normalization], as follow:

specificationVersion: 1
description {
  input {
    name: "data"
    type {
      imageType {
        width: 8
        height: 8
        colorSpace: RGB
      }
    }
  }
  output {
    name: "output1"
    type {
      multiArrayType {
        shape: 3
        shape: 8
        shape: 8
        dataType: DOUBLE
      }
    }
  }
}
neuralNetwork {
  layers {
    name: "convolution2d_1"
    input: "data"
    output: "convolution2d_1_output"
    convolution {
      outputChannels: 3
      kernelChannels: 3
      nGroups: 1
      kernelSize: 1
      kernelSize: 1
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
        floatValue: 1.0
        floatValue: 0.0
        floatValue: 0.0
        floatValue: 0.0
        floatValue: 1.0
        floatValue: 0.0
        floatValue: 0.0
        floatValue: 0.0
        floatValue: 1.0
      }
      bias {
        floatValue: 0
        floatValue: 0
        floatValue: 0
      }
      outputShape: 8
      outputShape: 8
    }
  }
  layers {
    name: "instancenormalization_1"
    input: "convolution2d_1_output"
    output: "output1"
    batchnorm {
      channels: 3
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 1e-05
      gamma {
        floatValue: 1.0
        floatValue: 0.0
        floatValue: 0.0
      }
      beta {
        floatValue: 0.0
        floatValue: 0.0
        floatValue: 0.0
      }
      mean {
        floatValue: 0
        floatValue: 0
        floatValue: 0
      }
      variance {
        floatValue: 1
        floatValue: 1
        floatValue: 1
      }
    }
  }
  preprocessing {
    scaler {
      channelScale: 1.0
    }
  }
}

Please notice that the bias in convolution is zero, and wieght is an identity, so the convolution act as an identity operation. The instance normalization layer is the same as network 1, only the name of input is changed to connect it to conv layer. So, we may expect the network output is the same as network 1. But it turns out, the input to channel 0 comes out unmodified from the network. If I set gamma to 0.5, the output is halved.


So, it seems the mean and variance are not computed? The mean and variance specified in protobuf are ignored indeed, as specified in documentation. It will be helpful if anyone can point out which part of my network is problematic.


Thanks in advance!

Accepted Reply

The issue is now fixed in Beta 4.

Thanks you Apple devs!

Replies

Yeah, this took me only a day and a half to figure out...

I filed a bug for it but as it turns out it is a known issue (see 0.4.0 release notes https://forums.developer.apple.com/thread/81196):

Note: Popular Keras 2.0 models have a Convolution→BatchNorm→Activation structure which exposes an underlying issue in CoreML.framework which results in incorrect execution on older versions of Mac and iPhone hardware. This issue has been fixed and will be part of an upcoming developer seed.


If you inspect the structure of the network after it has been compiled (there are JSON files in the app bundle that describe the compiled network), you'll see that the compiler converts convolutions that are followed by a normalization into one layer in CoreML. And there seem to be a bug in the implementation of that layer.


To work around that, I added a dummy no-op layer (crop layer with size 0) between the two, which prevents the compiler from trying to optimize there. That's oviously not ideal, but a feasible work-around until Beta 3 arrives with a fix.

Thanks a lot! I have spent a whole day too and frustrated.

The issue is now fixed in Beta 4.

Thanks you Apple devs!