Support for image outputs

I'm currently porting image style transfer neural networks to CoreML and it works great so far. The only downside is that the only output format seems to be a MLMultiArray, which I have to (slowly) convert back into an image.


Is there any chance we can get support for image outputs in the future? Or is there a way I can use the output data in Metal so I can do the conversion on the GPU myself?


Anyways, thanks for CoreML! It's great so far and I can't wait to see what's coming in the future.

Accepted Reply

While the NeuralNetworkBuilder currently does not have options for image outputs, you can use coremtools to modify the resulting model so that the desired multiarray output is treated as an image.


Here is an example helper function:

def convert_multiarray_output_to_image(spec, feature_name, is_bgr=False):
    """
    Convert an output multiarray to be represented as an image
    This will modify the Model_pb spec passed in.
    Example:
        model = coremltools.models.MLModel('MyNeuralNetwork.mlmodel')
        spec = model.get_spec()
        convert_multiarray_output_to_image(spec,'imageOutput',is_bgr=False)
        newModel = coremltools.models.MLModel(spec)
        newModel.save('MyNeuralNetworkWithImageOutput.mlmodel')
    Parameters
    ----------
    spec: Model_pb
        The specification containing the output feature to convert
    feature_name: str
        The name of the multiarray output feature you want to convert
    is_bgr: boolean
        If multiarray has 3 channels, set to True for RGB pixel order or false for BGR
    """
    for output in spec.description.output:
        if output.name != feature_name:
            continue
        if output.type.WhichOneof('Type') != 'multiArrayType':
            raise ValueError("%s is not a multiarray type" % output.name)
        array_shape = tuple(output.type.multiArrayType.shape)
        channels, height, width = array_shape
        from coremltools.proto import FeatureTypes_pb2 as ft
        if channels == 1:
            output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('GRAYSCALE')
        elif channels == 3:
            if is_bgr:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('BGR')
            else:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB')
        else:
            raise ValueError("Channel Value %d not supported for image inputs" % channels)
        output.type.imageType.width = width
        output.type.imageType.height = height


Note: Neural Networks can output images from a layer (as CVPixelBuffer), but it clamps the values between 0 and 255. i.e. values < 0 become 0, values > 255 become 255.


You can also just keep the output an MLMultiArray and index into pixels with something like this in swfit:

    let imageData = UnsafeMutablePointer<Double>(OpaquePointer(imageMultiArray.dataPointer))

    let channelStride = imageMultiArray.strides[0].intValue;
    let yStride = imageMultiArray.strides[1].intValue;
    let xStride = imageMultiArray.strides[2].intValue;

    func pixelOffset(_ channel: Int, _ y: Int, _ x: Int) -> Int {
        return channel*channelStride + y*yStride + x*xStride
    }

    let topLeftGreenPixel = Unit8(imageData[pixelOffset(1,0,0)])

Replies

Thanks!

HI Michael, could the team responsible for coreml compiler set the alpha channel to 255 by default? Though it is not very inconvenience to set it ourself, I think it will confuse more people as time goes by.


Could it be considered a bug?

I guess my question is about converting the input type MLMultiArray to Image instead of the output.

Sorry I misunderstood that.


Conversion of input to image is usually achieved by supplying the parameter "image_input_names" in the coreml conversion function. Why didn't you take that approach?

Oh, I didn't take that approach becuase a couple of user in stackoverflow reported that this approach didn't work and provided the link to this thread. I guess it's my fault that didn't try before asking.


Thanks again.

Thanks for those details, it really helped! One question though:


How would you apply scale and biases to the image data before conversion? For instance, my network outputs in the range [-1, 1] and I need to convert that to [0, 255].


When using the Keras converter, the input image can thankfully be scaled and biased. Can that code be reused somehow?

I have tackled a similar problem (subtract VGG constants) of post processing by manually inserting some 1x1 convolution. For your particular problem, you may try adding a 1x1x3x3 (1by1 kernel, 3 for image channel) conv layer of weight

[127.5, 0, 0

0, 127.5, 0

0, 0, 127.5]

and bias

[127.5, 127.5, 127.5]


Place it after you model's final layer.


This operation will scale each channel seperately into [-127.5, 127.5] and then add 127.5 into each channel. I have not tried it, just to give you a direction to work on.


As an aside, resetting alpha channel is no longer required at xcode beta 5.

There's actually a bias layer in the spec that does exactly that. No need for the work-around. Here is my helper for the NeuralNetworkBuilder:


def _add_bias(self, name, bias, shape, input_name, output_name):
  """
  Add bias layer to the model.

  Parameters
  ----------
  name: str
  The name of this layer.
  bias: numpy.array
  The biases to apply to the inputs.
  The size must be equal to the product of the ``shape`` dimensions.
  shape: tuple
  The shape of the bias. Must be one of the following:
  ``[1]``, ``[C]``, ``[1, H, W]`` or ``[C, H, W]``.
  input_name: str
  The input blob name of this layer.
  output_name: str
  The output blob name of this layer.
  """

  nn_spec = self.nn_spec
  # Add a new bias layer
  spec_layer = nn_spec.layers.add()
  spec_layer.name = name
  spec_layer.input.append(input_name)
  spec_layer.output.append(output_name)
  spec_layer_params = spec_layer.bias
  spec_layer_params.bias.floatValue.extend(map(float, bias.flatten()))
  spec_layer_params.shape.extend(shape)

_NeuralNetworkBuilder.add_bias = _add_bias

I'm trying to convert outputs of mlmodel to UIImage, but it's not working...

outputs of mlmodel : Image(Grayscale width x height)


guard let results = request.results as? [VNPixelBufferObservation]  else{  fatalError("Fatal error")}
print(String(describing: type(of: results))
print(String(describing: type(of: results[0])))
let ciImage = CIImage(cvPixelBuffer: results[0].pixelBuffer)


Outputs:

Array<VNPixelBufferObservation>

VNPixelBufferObservation


Error occurs on line 04 :

Thread 1: EXC_BAD_ACCESS (code=1, address=0xe136dbec8)


-------------------------------------------------------------------------------------------

I'm trying to keep mlmodel MultiArray.


outputs of mlmodel : MultiArray (Double 1 x width x height)


guard let results = request.results as? [VNCoreMLFeatureValueObservation]  else{  fatalError("Fatal error")}

print(String(describing: type(of: results))
print(String(describing: type(of: results[0]))
print(String(describing: type(of: results[0].featureValue)))
print(results[0].featureValue)
print(results[0].featureValue.multiArrayValue)

let imageMultiArray = results.[0].featureValue.multiArrayValue
let imageData = UnsafeMutablePointer<Double>(OpaquePointer(imageMultiArray?.dataPointer))
let channelStride = imageMultiArray?.strides[0].intValue;
let yStride = imageMultiArray?.strides[1].intValue;
let xStride = imageMultiArray?.strides[2].intValue;

func pixelOffset(_ channel: Int, _ y: Int, _ x: Int) -> Int {
  return channel*channelStride! + y*yStride! + x*xStride!
}

let topLeftGreenPixel = Unit8(imageData![pixelOffset(1,0,0)])


Outputs:

Array<VNCoreMLFeatureValueObservation>

VNCoreMLFeatureValueObservation

Optional<MLFeatureValue>

Optional(MultiArray : Double 1 x width x height array)

Optional(Double 1 x width x height array)


Error occurs on line 19 :Use of unresolved identifier 'Unit8

shoud be replaced by Uint8 ? and how to convert to UIImage?



Thank you for your any help in advance!

Yes, Unit8 is a typo and should be Uint8.


But regardless, you should really try to adjust the spec of your model to produce image outputs (instead of an multi-array). This way you should be able to get the pixel buffer directly from the prediction. In the answer above you can see the Python function that can do that for you.

Thank you for your response.


The output of my mlmodel is already image outputs (Grayscale width x height).

Error does not occur in getting outputs on line1.

But error occurs when I try to access to pixelBuffer on line 04

Error message: Thread 1: EXC_BAD_ACCESS (code=1, address=0xe136dbec8)

guard let results = request.results as? [VNPixelBufferObservation]  else{  fatalError("Fatal error")}
print(String(describing: type(of: results)) //->Array<VNPixelBufferObservation>
print(String(describing: type(of: results[0]))) //->VNPixelBufferObservation
let ciImage = CIImage(cvPixelBuffer: results[0].pixelBuffer)


Thank you.

Ok, that's indeed strange. Do you get any console output?


Also maybe try to use your model with CoreML directly, without Vision. Do you get the same error there?

print(String(describing: type(of: results))
print(String(describing: type(of: results[0])))
let ciImage = CIImage(cvPixelBuffer: results[0].pixelBuffer)

Output is

Array<VNPixelBufferObservation>

VNPixelBufferObservation

Thread 1: EXC_BAD_ACCESS (code=1, address=0xe136dbec8)


How to use CoreML directly, without Vision? Any URL?

It's a bit tedious because you have to do the pixel buffer conversion yourself. Check out the repo hollance/CoreMLHelpers on Github so see how it's done (sorry for no link, but wanted to avoid waiting for moderation).

It worked! Thank you.