Support for image outputs

I'm currently porting image style transfer neural networks to CoreML and it works great so far. The only downside is that the only output format seems to be a MLMultiArray, which I have to (slowly) convert back into an image.


Is there any chance we can get support for image outputs in the future? Or is there a way I can use the output data in Metal so I can do the conversion on the GPU myself?


Anyways, thanks for CoreML! It's great so far and I can't wait to see what's coming in the future.

Accepted Reply

While the NeuralNetworkBuilder currently does not have options for image outputs, you can use coremtools to modify the resulting model so that the desired multiarray output is treated as an image.


Here is an example helper function:

def convert_multiarray_output_to_image(spec, feature_name, is_bgr=False):
    """
    Convert an output multiarray to be represented as an image
    This will modify the Model_pb spec passed in.
    Example:
        model = coremltools.models.MLModel('MyNeuralNetwork.mlmodel')
        spec = model.get_spec()
        convert_multiarray_output_to_image(spec,'imageOutput',is_bgr=False)
        newModel = coremltools.models.MLModel(spec)
        newModel.save('MyNeuralNetworkWithImageOutput.mlmodel')
    Parameters
    ----------
    spec: Model_pb
        The specification containing the output feature to convert
    feature_name: str
        The name of the multiarray output feature you want to convert
    is_bgr: boolean
        If multiarray has 3 channels, set to True for RGB pixel order or false for BGR
    """
    for output in spec.description.output:
        if output.name != feature_name:
            continue
        if output.type.WhichOneof('Type') != 'multiArrayType':
            raise ValueError("%s is not a multiarray type" % output.name)
        array_shape = tuple(output.type.multiArrayType.shape)
        channels, height, width = array_shape
        from coremltools.proto import FeatureTypes_pb2 as ft
        if channels == 1:
            output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('GRAYSCALE')
        elif channels == 3:
            if is_bgr:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('BGR')
            else:
                output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB')
        else:
            raise ValueError("Channel Value %d not supported for image inputs" % channels)
        output.type.imageType.width = width
        output.type.imageType.height = height


Note: Neural Networks can output images from a layer (as CVPixelBuffer), but it clamps the values between 0 and 255. i.e. values < 0 become 0, values > 255 become 255.


You can also just keep the output an MLMultiArray and index into pixels with something like this in swfit:

    let imageData = UnsafeMutablePointer<Double>(OpaquePointer(imageMultiArray.dataPointer))

    let channelStride = imageMultiArray.strides[0].intValue;
    let yStride = imageMultiArray.strides[1].intValue;
    let xStride = imageMultiArray.strides[2].intValue;

    func pixelOffset(_ channel: Int, _ y: Int, _ x: Int) -> Int {
        return channel*channelStride + y*yStride + x*xStride
    }

    let topLeftGreenPixel = Unit8(imageData[pixelOffset(1,0,0)])

Replies

Glad to hear that!


It's a bit curious that it's not working with Vision, though. Maybe it's because your output is grey-scale and Vision expects RGB?

Dear Frank and ktak199, I think I am facing the same issue.

I converted my torch-model for style transfer to coreml using the torch2coreml converter.


As soon as I access the pixelBuffer property of the VNPixelBufferObservation in the completionHandler of the VNCoreMLRequest, the program crashes with EXC_BAD_ACCESS.

Can you confirm that this problem is caused by using the vision framework, and not by the model conversion procedure?

So, if I use „plain“ CoreML, chances are high that the model will work?

Thanks a lot in advance

Oliver

array_shape = tuple(output.type.multiArrayType.shape)

This is returning an enpty tuple. I checked the shape of my model on xcode and the dimension is 1*1*2*224*224 which i guess corresponds to channel=2,height=224,width=224(no idea about other two dimensions). So my question is why empty tuple is being returned ?

And also i want to know what does channel value 2 represent??? The output of my model was suppossed to be grayscale , so value should have been 1 i guess.

Thanks in advance!!!

Hmm, it seems there is something off with your model spec. Can you maybe print output and post it here? It should have a 3-dimensional shape and only one channel if it's a grayscale image.


The first two dimensions you see are used for internal batch processing and should actually not be exposed in the output of the model.

Unlike others who have used this(forcing the model to output an image) method and gotten back CVPixelBuffers with alpha 0, I am getting back an alpha of 255 and r,g,b in [0,1]. Ignoring the alpha channel, PIL displays an image that is definitely related to the desired output. Could it have something to do with the image scale/rgb biases when i convert the model from keras? I am not really sure where to go with this, we're trying to convert the CVPixelBuffer output into a MetalTexture and want to avoid extra post-processing steps.


For clarity: I am converting from keras to coreML, and then calling "convert_multiarray_output_to_image(...)" on the coreML model. I have an image scale of 1/127.5 and rgb biases of -1 at the keras conversion step.

Dear Developers:


I transform a Keras model of input a gray single-channel image and output a gray single-channel image.


In using a "coremltools", coremltools.converters.keras.convert , setting

----------------------

coreml_model = coremltools.converters.keras.convert(model, input_names = 'data',

image_input_names='data',

output_names='outputImage',

image_scale= 1/255.0)

------------------------

coreml_model only sets Inputs data: Image(Grayscale 256x256)

Outputs outputImage:MutliArray( Double 1x256x256)


if writing codes below,

--------------------------

def convert_multiarray_output_to_image(spec, feature_name, is_bgr=False):

for output in spec.description.output:

if output.name != feature_name:

continue

if output.type.WhichOneof('Type') != 'multiArrayType':

raise ValueError("%s is not a multiarray type" % output.name)

array_shape = tuple(output.type.multiArrayType.shape)

channels, height, width = array_shape

from coremltools.proto import FeatureTypes_pb2 as ft

if channels == 1:

output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('GRAYSCALE')

elif channels == 3:


spec = coreml_model.get_spec()

convert_multiarray_output_to_image( spec, 'outputImage', is_bgr=False)

newModel = coremltools.models.MLModel(spec)

------------------------

coreml_model only sets Inputs inputImage: MutliArray( Double 1x256x256 )

Outputs outputImage:Image( Grayscale 256x256)



Question:

Is there any way to output coreml_model as below ?

Inputs inputImage: image( Grayscale 256x256 )

Outputs outputImage:Image( Grayscale 256x256)



Thanks

Hey Oliver,


Sorry for the late response.

While I can't confirm that this is definitely an issue with the Vision framework, I would at least recommend you give the manual approach a try. It's not that hard and it gives you much more control over the conversion. I don't know why Vision can't handle your model output, though.

Hi Bruce,


You also need to convert the input into an image. You can do that using the same method as for outputs. Just replace all instances of "output" with "input" in your convert_multiarray_output_to_image method and you got yourself a convert_multiarray_input_to_image method. Then you just need to apply that to the spec as well before creating your model.

How to incorporate the bias layer to the conversion process?

You probably need to alter the resulting Core ML model using the coremltools library (Python side). In short, add a bias layer at the end which performs the final linear transformation that you need.



I’ve written it out in detail here: https://cutecoder.org/programming/core-ml-image-output/


That took me a few weeks to figure out, hence I made a post that’s hopefully useful (and hopefully the functionality can be built into coremltools itself).