Support for image outputs

I'm currently porting image style transfer neural networks to CoreML and it works great so far. The only downside is that the only output format seems to be a MLMultiArray, which I have to (slowly) convert back into an image.


Is there any chance we can get support for image outputs in the future? Or is there a way I can use the output data in Metal so I can do the conversion on the GPU myself?


Anyways, thanks for CoreML! It's great so far and I can't wait to see what's coming in the future.

Answered by Frameworks Engineer in 241998022

While the NeuralNetworkBuilder currently does not have options for image outputs, you can use coremtools to modify the resulting model so that the desired multiarray output is treated as an image.


Here is an example helper function:

def convert_multiarray_output_to_image(spec, feature_name, is_bgr=False):
    """
    Convert an output multiarray to be represented as an image
    This will modify the Model_pb spec passed in.
    Example:
        model = coremltools.models.MLModel('MyNeuralNetwork.mlmodel')
        spec = model.get_spec()
        convert_multiarray_output_to_image(spec,'imageOutput',is_bgr=False)
        newModel = coremltools.models.MLModel(spec)
        newModel.save('MyNeuralNetworkWithImageOutput.mlmodel')
    Parameters
    ----------
    spec: Model_pb
        The specification containing the output feature to convert
    feature_name: str
        The name of the multiarray output feature you want to convert
    is_bgr: boolean
        If multiarray has 3 channels, set to True for RGB pixel order or false for BGR
    """
    for output in spec.description.output:
        if output.name != feature_name:
            continue
        if output.type.WhichOneof('Type') != 'multiArrayType':
            raise ValueError("%s is not a multiarray type" % output.name)
        array_shape = tuple(output.type.multiArrayType.shape)
        channels, height, width = array_shape
        from coremltools.proto import FeatureTypes_pb2 as ft
        if channels == 1:
            output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('GRAYSCALE')
        elif channels == 3:
            if is_bgr:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('BGR')
            else:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB')
        else:
            raise ValueError("Channel Value %d not supported for image inputs" % channels)
        output.type.imageType.width = width
        output.type.imageType.height = height


Note: Neural Networks can output images from a layer (as CVPixelBuffer), but it clamps the values between 0 and 255. i.e. values < 0 become 0, values > 255 become 255.


You can also just keep the output an MLMultiArray and index into pixels with something like this in swfit:

    let imageData = UnsafeMutablePointer<Double>(OpaquePointer(imageMultiArray.dataPointer))

    let channelStride = imageMultiArray.strides[0].intValue;
    let yStride = imageMultiArray.strides[1].intValue;
    let xStride = imageMultiArray.strides[2].intValue;

    func pixelOffset(_ channel: Int, _ y: Int, _ x: Int) -> Int {
        return channel*channelStride + y*yStride + x*xStride
    }

    let topLeftGreenPixel = Unit8(imageData[pixelOffset(1,0,0)])

I just tried to get the result into a Metal buffer to convert it in a compute shader, but I'm running into the issue that Metal doesn't support doubles. And CoreML seem to always return doubles in the output (even if I tell it to use FLOAT32 in the spec).

How does CoreML actually do it under the hood when the GPU doesn't support double precision?

If your model outputs an image (i.e. something with width, height, and a depth of 3 or 4 channels), then Core ML can interpret that as an image. You need to pass a parameter for this in the coremltools conversion script, so that Core ML knows this output should be interpreted as an image.

How do I do that? The NeuralNetworkBilder has only a method for pre-processing image inputs, but not for post-processing outputs. If I try to convert the type of the output directly in the spec, the model compiles (and Xcode shows the format correctly), but the result is wrong.

I guess the output only becomes an image if you specify `class_labels` in the call to convert(), but you're not really building a classifier so that wouldn't work. So what I had in mind is not actually a solution to your problem.


This is why I prefer implementing neural networks with MPS. ;-)

Accepted Answer

While the NeuralNetworkBuilder currently does not have options for image outputs, you can use coremtools to modify the resulting model so that the desired multiarray output is treated as an image.


Here is an example helper function:

def convert_multiarray_output_to_image(spec, feature_name, is_bgr=False):
    """
    Convert an output multiarray to be represented as an image
    This will modify the Model_pb spec passed in.
    Example:
        model = coremltools.models.MLModel('MyNeuralNetwork.mlmodel')
        spec = model.get_spec()
        convert_multiarray_output_to_image(spec,'imageOutput',is_bgr=False)
        newModel = coremltools.models.MLModel(spec)
        newModel.save('MyNeuralNetworkWithImageOutput.mlmodel')
    Parameters
    ----------
    spec: Model_pb
        The specification containing the output feature to convert
    feature_name: str
        The name of the multiarray output feature you want to convert
    is_bgr: boolean
        If multiarray has 3 channels, set to True for RGB pixel order or false for BGR
    """
    for output in spec.description.output:
        if output.name != feature_name:
            continue
        if output.type.WhichOneof('Type') != 'multiArrayType':
            raise ValueError("%s is not a multiarray type" % output.name)
        array_shape = tuple(output.type.multiArrayType.shape)
        channels, height, width = array_shape
        from coremltools.proto import FeatureTypes_pb2 as ft
        if channels == 1:
            output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('GRAYSCALE')
        elif channels == 3:
            if is_bgr:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('BGR')
            else:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB')
        else:
            raise ValueError("Channel Value %d not supported for image inputs" % channels)
        output.type.imageType.width = width
        output.type.imageType.height = height


Note: Neural Networks can output images from a layer (as CVPixelBuffer), but it clamps the values between 0 and 255. i.e. values < 0 become 0, values > 255 become 255.


You can also just keep the output an MLMultiArray and index into pixels with something like this in swfit:

    let imageData = UnsafeMutablePointer<Double>(OpaquePointer(imageMultiArray.dataPointer))

    let channelStride = imageMultiArray.strides[0].intValue;
    let yStride = imageMultiArray.strides[1].intValue;
    let xStride = imageMultiArray.strides[2].intValue;

    func pixelOffset(_ channel: Int, _ y: Int, _ x: Int) -> Int {
        return channel*channelStride + y*yStride + x*xStride
    }

    let topLeftGreenPixel = Unit8(imageData[pixelOffset(1,0,0)])

I actually tried pretty much exactly that (tagging the output as image). But my problem is that I can't seem to get the CVPixelBuffer back into some displayable format. I gonna keep on trying.


I still don't really understand how CoreML can give me doubles when it's doing its computation on the GPU, though...

What displayable format are you looking for? Here are some potentially useful methods for converting a CVPixelBuffer output into other representations:


Construct CIImage from CVPixelBuffer:

https://developer.apple.com/documentation/coreimage/ciimage/1438072-init


Construct UIImage from CIImage:

https://developer.apple.com/documentation/uikit/uiimage/1624114-init


Construct a CV Metal Texture from existing CoreVideo buffer:

https://developer.apple.com/documentation/corevideo/1456754-cvmetaltexturecachecreatetexture

https://developer.apple.com/documentation/corevideo/1456868-cvmetaltexturegettexture

Thank's for your help, Michael.


It acutally works! As it turns out I kinda wasn't trying hard enough. I had the CIImage approach, but the image simply didn't show—neither when inspecting with Quick Look in Xcode nor when displaying in a view. Now I checked the memory and as it turnes out the alpha channel of the result is 0. So if I render it into a new context with CGImageAlphaInfo.noneSkipLast it works.


I can work with that, thanks!

To use the helper you provided, "convert_multiarray_output_to_image", does the model output dataType need to be DOUBLE or INT32, or both?

There is no requirement for the type, I think. The has to be in the shape (channels, height, width) and have either 1 or 3 channels. I guess the internal conversion step will handle the rest.

Thanks a lot. I will try this out.

Edit:

I have solved the problem by taking another approach. I take a CVPixelBufferGetBaseAddress(myCVPixelBuffer), and then directly set the alpha chennel to 255 to "remove" the alpha channel. This approach might be a bit raw as it use UnsafeRawPointer and assume the alpha pixels are located at 3, 7, 11, ... But at least it works.


[Old message]

I think I have got the same problem of getting a "not displayable format", in other word, the image does not show in quick look. I can already get the image when MLMutiArray is returned, but not when converted CVPixelBuffer in the mlmodel. Sorry I am quite new to ios development so I don't understand how to use "CGImageAlphaInfo.noneSkipLast". I guessed that I need to do the conversion cvPixelBuffer -> CIImage -> CGImage, and then create a CGContext with CGImageAlphaInfo.noneSkipLast, then draw the CGImage to the CGContext, finally get a CGImage from the CGContext.


But somehow the image becomes black. Is the Is this the correct approach? Could you please share some steps of your approach?


Thanks for your many helps.

Hey Michael, can I use this script to conver my MLMultiArray input to be an Image instead? I'm guessing that it needs to consider

spec.description.input

in this case.

Just for your reference, if you network orginal output shape is [3, 511, 511] then after conversion to CVPixelBuffer as output, the diff is only:


--- oil_mlmodel_yespost.pb 2017-07-10 11:00:21.078301960 +0800
+++ oil_mlmodel_yespost_cvbuffer.pb 2017-07-10 10:59:38.374233180 +0800
@@ -13,11 +13,10 @@
   output {
     name: "output"
       type {
-        multiArrayType {
-          shape: 3
-          shape: 511
-          shape: 511
-          dataType: DOUBLE
+        imageType {
+          width: 511
+          height: 511
+          colorSpace: RGB
         }
      }
   }



And in my case I just need to run convert_multiarray_output_to_image(tm, 'output'), where tm is my model. I don't need to specify the input.

For me this works without changing the pixel buffer (output is your CVPixelBuffer):


CVPixelBufferLockBaseAddress(output, .readOnly)

let width = CVPixelBufferGetWidth(output)
let height = CVPixelBufferGetHeight(output)
let data = CVPixelBufferGetBaseAddress(output)!

let outContext = CGContext(data: data, width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(output), space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.noneSkipLast.rawValue)!
let outImage = outContext.makeImage()!

CVPixelBufferUnlockBaseAddress(output, .readOnly)

Thanks!

HI Michael, could the team responsible for coreml compiler set the alpha channel to 255 by default? Though it is not very inconvenience to set it ourself, I think it will confuse more people as time goes by.


Could it be considered a bug?

I guess my question is about converting the input type MLMultiArray to Image instead of the output.

Sorry I misunderstood that.


Conversion of input to image is usually achieved by supplying the parameter "image_input_names" in the coreml conversion function. Why didn't you take that approach?

Oh, I didn't take that approach becuase a couple of user in stackoverflow reported that this approach didn't work and provided the link to this thread. I guess it's my fault that didn't try before asking.


Thanks again.

Thanks for those details, it really helped! One question though:


How would you apply scale and biases to the image data before conversion? For instance, my network outputs in the range [-1, 1] and I need to convert that to [0, 255].


When using the Keras converter, the input image can thankfully be scaled and biased. Can that code be reused somehow?

I have tackled a similar problem (subtract VGG constants) of post processing by manually inserting some 1x1 convolution. For your particular problem, you may try adding a 1x1x3x3 (1by1 kernel, 3 for image channel) conv layer of weight

[127.5, 0, 0

0, 127.5, 0

0, 0, 127.5]

and bias

[127.5, 127.5, 127.5]


Place it after you model's final layer.


This operation will scale each channel seperately into [-127.5, 127.5] and then add 127.5 into each channel. I have not tried it, just to give you a direction to work on.


As an aside, resetting alpha channel is no longer required at xcode beta 5.

There's actually a bias layer in the spec that does exactly that. No need for the work-around. Here is my helper for the NeuralNetworkBuilder:


def _add_bias(self, name, bias, shape, input_name, output_name):
  """
  Add bias layer to the model.

  Parameters
  ----------
  name: str
  The name of this layer.
  bias: numpy.array
  The biases to apply to the inputs.
  The size must be equal to the product of the ``shape`` dimensions.
  shape: tuple
  The shape of the bias. Must be one of the following:
  ``[1]``, ``[C]``, ``[1, H, W]`` or ``[C, H, W]``.
  input_name: str
  The input blob name of this layer.
  output_name: str
  The output blob name of this layer.
  """

  nn_spec = self.nn_spec
  # Add a new bias layer
  spec_layer = nn_spec.layers.add()
  spec_layer.name = name
  spec_layer.input.append(input_name)
  spec_layer.output.append(output_name)
  spec_layer_params = spec_layer.bias
  spec_layer_params.bias.floatValue.extend(map(float, bias.flatten()))
  spec_layer_params.shape.extend(shape)

_NeuralNetworkBuilder.add_bias = _add_bias

I'm trying to convert outputs of mlmodel to UIImage, but it's not working...

outputs of mlmodel : Image(Grayscale width x height)


guard let results = request.results as? [VNPixelBufferObservation]  else{  fatalError("Fatal error")}
print(String(describing: type(of: results))
print(String(describing: type(of: results[0])))
let ciImage = CIImage(cvPixelBuffer: results[0].pixelBuffer)


Outputs:

Array<VNPixelBufferObservation>

VNPixelBufferObservation


Error occurs on line 04 :

Thread 1: EXC_BAD_ACCESS (code=1, address=0xe136dbec8)


-------------------------------------------------------------------------------------------

I'm trying to keep mlmodel MultiArray.


outputs of mlmodel : MultiArray (Double 1 x width x height)


guard let results = request.results as? [VNCoreMLFeatureValueObservation]  else{  fatalError("Fatal error")}

print(String(describing: type(of: results))
print(String(describing: type(of: results[0]))
print(String(describing: type(of: results[0].featureValue)))
print(results[0].featureValue)
print(results[0].featureValue.multiArrayValue)

let imageMultiArray = results.[0].featureValue.multiArrayValue
let imageData = UnsafeMutablePointer<Double>(OpaquePointer(imageMultiArray?.dataPointer))
let channelStride = imageMultiArray?.strides[0].intValue;
let yStride = imageMultiArray?.strides[1].intValue;
let xStride = imageMultiArray?.strides[2].intValue;

func pixelOffset(_ channel: Int, _ y: Int, _ x: Int) -> Int {
  return channel*channelStride! + y*yStride! + x*xStride!
}

let topLeftGreenPixel = Unit8(imageData![pixelOffset(1,0,0)])


Outputs:

Array<VNCoreMLFeatureValueObservation>

VNCoreMLFeatureValueObservation

Optional<MLFeatureValue>

Optional(MultiArray : Double 1 x width x height array)

Optional(Double 1 x width x height array)


Error occurs on line 19 :Use of unresolved identifier 'Unit8

shoud be replaced by Uint8 ? and how to convert to UIImage?



Thank you for your any help in advance!

Yes, Unit8 is a typo and should be Uint8.


But regardless, you should really try to adjust the spec of your model to produce image outputs (instead of an multi-array). This way you should be able to get the pixel buffer directly from the prediction. In the answer above you can see the Python function that can do that for you.

Support for image outputs
 
 
Q