Support for image outputs

I'm currently porting image style transfer neural networks to CoreML and it works great so far. The only downside is that the only output format seems to be a MLMultiArray, which I have to (slowly) convert back into an image.


Is there any chance we can get support for image outputs in the future? Or is there a way I can use the output data in Metal so I can do the conversion on the GPU myself?


Anyways, thanks for CoreML! It's great so far and I can't wait to see what's coming in the future.

Accepted Reply

While the NeuralNetworkBuilder currently does not have options for image outputs, you can use coremtools to modify the resulting model so that the desired multiarray output is treated as an image.


Here is an example helper function:

def convert_multiarray_output_to_image(spec, feature_name, is_bgr=False):
    """
    Convert an output multiarray to be represented as an image
    This will modify the Model_pb spec passed in.
    Example:
        model = coremltools.models.MLModel('MyNeuralNetwork.mlmodel')
        spec = model.get_spec()
        convert_multiarray_output_to_image(spec,'imageOutput',is_bgr=False)
        newModel = coremltools.models.MLModel(spec)
        newModel.save('MyNeuralNetworkWithImageOutput.mlmodel')
    Parameters
    ----------
    spec: Model_pb
        The specification containing the output feature to convert
    feature_name: str
        The name of the multiarray output feature you want to convert
    is_bgr: boolean
        If multiarray has 3 channels, set to True for RGB pixel order or false for BGR
    """
    for output in spec.description.output:
        if output.name != feature_name:
            continue
        if output.type.WhichOneof('Type') != 'multiArrayType':
            raise ValueError("%s is not a multiarray type" % output.name)
        array_shape = tuple(output.type.multiArrayType.shape)
        channels, height, width = array_shape
        from coremltools.proto import FeatureTypes_pb2 as ft
        if channels == 1:
            output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('GRAYSCALE')
        elif channels == 3:
            if is_bgr:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('BGR')
            else:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB')
        else:
            raise ValueError("Channel Value %d not supported for image inputs" % channels)
        output.type.imageType.width = width
        output.type.imageType.height = height


Note: Neural Networks can output images from a layer (as CVPixelBuffer), but it clamps the values between 0 and 255. i.e. values < 0 become 0, values > 255 become 255.


You can also just keep the output an MLMultiArray and index into pixels with something like this in swfit:

    let imageData = UnsafeMutablePointer<Double>(OpaquePointer(imageMultiArray.dataPointer))

    let channelStride = imageMultiArray.strides[0].intValue;
    let yStride = imageMultiArray.strides[1].intValue;
    let xStride = imageMultiArray.strides[2].intValue;

    func pixelOffset(_ channel: Int, _ y: Int, _ x: Int) -> Int {
        return channel*channelStride + y*yStride + x*xStride
    }

    let topLeftGreenPixel = Unit8(imageData[pixelOffset(1,0,0)])

Replies

I just tried to get the result into a Metal buffer to convert it in a compute shader, but I'm running into the issue that Metal doesn't support doubles. And CoreML seem to always return doubles in the output (even if I tell it to use FLOAT32 in the spec).

How does CoreML actually do it under the hood when the GPU doesn't support double precision?

If your model outputs an image (i.e. something with width, height, and a depth of 3 or 4 channels), then Core ML can interpret that as an image. You need to pass a parameter for this in the coremltools conversion script, so that Core ML knows this output should be interpreted as an image.

How do I do that? The NeuralNetworkBilder has only a method for pre-processing image inputs, but not for post-processing outputs. If I try to convert the type of the output directly in the spec, the model compiles (and Xcode shows the format correctly), but the result is wrong.

I guess the output only becomes an image if you specify `class_labels` in the call to convert(), but you're not really building a classifier so that wouldn't work. So what I had in mind is not actually a solution to your problem.


This is why I prefer implementing neural networks with MPS. ;-)

While the NeuralNetworkBuilder currently does not have options for image outputs, you can use coremtools to modify the resulting model so that the desired multiarray output is treated as an image.


Here is an example helper function:

def convert_multiarray_output_to_image(spec, feature_name, is_bgr=False):
    """
    Convert an output multiarray to be represented as an image
    This will modify the Model_pb spec passed in.
    Example:
        model = coremltools.models.MLModel('MyNeuralNetwork.mlmodel')
        spec = model.get_spec()
        convert_multiarray_output_to_image(spec,'imageOutput',is_bgr=False)
        newModel = coremltools.models.MLModel(spec)
        newModel.save('MyNeuralNetworkWithImageOutput.mlmodel')
    Parameters
    ----------
    spec: Model_pb
        The specification containing the output feature to convert
    feature_name: str
        The name of the multiarray output feature you want to convert
    is_bgr: boolean
        If multiarray has 3 channels, set to True for RGB pixel order or false for BGR
    """
    for output in spec.description.output:
        if output.name != feature_name:
            continue
        if output.type.WhichOneof('Type') != 'multiArrayType':
            raise ValueError("%s is not a multiarray type" % output.name)
        array_shape = tuple(output.type.multiArrayType.shape)
        channels, height, width = array_shape
        from coremltools.proto import FeatureTypes_pb2 as ft
        if channels == 1:
            output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('GRAYSCALE')
        elif channels == 3:
            if is_bgr:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('BGR')
            else:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB')
        else:
            raise ValueError("Channel Value %d not supported for image inputs" % channels)
        output.type.imageType.width = width
        output.type.imageType.height = height


Note: Neural Networks can output images from a layer (as CVPixelBuffer), but it clamps the values between 0 and 255. i.e. values < 0 become 0, values > 255 become 255.


You can also just keep the output an MLMultiArray and index into pixels with something like this in swfit:

    let imageData = UnsafeMutablePointer<Double>(OpaquePointer(imageMultiArray.dataPointer))

    let channelStride = imageMultiArray.strides[0].intValue;
    let yStride = imageMultiArray.strides[1].intValue;
    let xStride = imageMultiArray.strides[2].intValue;

    func pixelOffset(_ channel: Int, _ y: Int, _ x: Int) -> Int {
        return channel*channelStride + y*yStride + x*xStride
    }

    let topLeftGreenPixel = Unit8(imageData[pixelOffset(1,0,0)])

I actually tried pretty much exactly that (tagging the output as image). But my problem is that I can't seem to get the CVPixelBuffer back into some displayable format. I gonna keep on trying.


I still don't really understand how CoreML can give me doubles when it's doing its computation on the GPU, though...

What displayable format are you looking for? Here are some potentially useful methods for converting a CVPixelBuffer output into other representations:


Construct CIImage from CVPixelBuffer:

https://developer.apple.com/documentation/coreimage/ciimage/1438072-init


Construct UIImage from CIImage:

https://developer.apple.com/documentation/uikit/uiimage/1624114-init


Construct a CV Metal Texture from existing CoreVideo buffer:

https://developer.apple.com/documentation/corevideo/1456754-cvmetaltexturecachecreatetexture

https://developer.apple.com/documentation/corevideo/1456868-cvmetaltexturegettexture

Thank's for your help, Michael.


It acutally works! As it turns out I kinda wasn't trying hard enough. I had the CIImage approach, but the image simply didn't show—neither when inspecting with Quick Look in Xcode nor when displaying in a view. Now I checked the memory and as it turnes out the alpha channel of the result is 0. So if I render it into a new context with CGImageAlphaInfo.noneSkipLast it works.


I can work with that, thanks!

To use the helper you provided, "convert_multiarray_output_to_image", does the model output dataType need to be DOUBLE or INT32, or both?

There is no requirement for the type, I think. The has to be in the shape (channels, height, width) and have either 1 or 3 channels. I guess the internal conversion step will handle the rest.

Thanks a lot. I will try this out.

Edit:

I have solved the problem by taking another approach. I take a CVPixelBufferGetBaseAddress(myCVPixelBuffer), and then directly set the alpha chennel to 255 to "remove" the alpha channel. This approach might be a bit raw as it use UnsafeRawPointer and assume the alpha pixels are located at 3, 7, 11, ... But at least it works.


[Old message]

I think I have got the same problem of getting a "not displayable format", in other word, the image does not show in quick look. I can already get the image when MLMutiArray is returned, but not when converted CVPixelBuffer in the mlmodel. Sorry I am quite new to ios development so I don't understand how to use "CGImageAlphaInfo.noneSkipLast". I guessed that I need to do the conversion cvPixelBuffer -> CIImage -> CGImage, and then create a CGContext with CGImageAlphaInfo.noneSkipLast, then draw the CGImage to the CGContext, finally get a CGImage from the CGContext.


But somehow the image becomes black. Is the Is this the correct approach? Could you please share some steps of your approach?


Thanks for your many helps.

Hey Michael, can I use this script to conver my MLMultiArray input to be an Image instead? I'm guessing that it needs to consider

spec.description.input

in this case.

Just for your reference, if you network orginal output shape is [3, 511, 511] then after conversion to CVPixelBuffer as output, the diff is only:


--- oil_mlmodel_yespost.pb 2017-07-10 11:00:21.078301960 +0800
+++ oil_mlmodel_yespost_cvbuffer.pb 2017-07-10 10:59:38.374233180 +0800
@@ -13,11 +13,10 @@
   output {
     name: "output"
       type {
-        multiArrayType {
-          shape: 3
-          shape: 511
-          shape: 511
-          dataType: DOUBLE
+        imageType {
+          width: 511
+          height: 511
+          colorSpace: RGB
         }
      }
   }



And in my case I just need to run convert_multiarray_output_to_image(tm, 'output'), where tm is my model. I don't need to specify the input.

For me this works without changing the pixel buffer (output is your CVPixelBuffer):


CVPixelBufferLockBaseAddress(output, .readOnly)

let width = CVPixelBufferGetWidth(output)
let height = CVPixelBufferGetHeight(output)
let data = CVPixelBufferGetBaseAddress(output)!

let outContext = CGContext(data: data, width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(output), space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.noneSkipLast.rawValue)!
let outImage = outContext.makeImage()!

CVPixelBufferUnlockBaseAddress(output, .readOnly)