23 Replies
      Latest reply: Aug 24, 2017 2:47 AM by FrankSchlegel RSS
      FrankSchlegel Level 1 Level 1 (20 points)

        I'm currently porting image style transfer neural networks to CoreML and it works great so far. The only downside is that the only output format seems to be a MLMultiArray, which I have to (slowly) convert back into an image.


        Is there any chance we can get support for image outputs in the future? Or is there a way I can use the output data in Metal so I can do the conversion on the GPU myself?


        Anyways, thanks for CoreML! It's great so far and I can't wait to see what's coming in the future.

        • Re: Support for image outputs
          FrankSchlegel Level 1 Level 1 (20 points)

          I just tried to get the result into a Metal buffer to convert it in a compute shader, but I'm running into the issue that Metal doesn't support doubles. And CoreML seem to always return doubles in the output (even if I tell it to use FLOAT32 in the spec).

          How does CoreML actually do it under the hood when the GPU doesn't support double precision?

          • Re: Support for image outputs
            kerfuffle Level 2 Level 2 (80 points)

            If your model outputs an image (i.e. something with width, height, and a depth of 3 or 4 channels), then Core ML can interpret that as an image. You need to pass a parameter for this in the coremltools conversion script, so that Core ML knows this output should be interpreted as an image.

              • Re: Support for image outputs
                FrankSchlegel Level 1 Level 1 (20 points)

                How do I do that? The NeuralNetworkBilder has only a method for pre-processing image inputs, but not for post-processing outputs. If I try to convert the type of the output directly in the spec, the model compiles (and Xcode shows the format correctly), but the result is wrong.

                  • Re: Support for image outputs
                    kerfuffle Level 2 Level 2 (80 points)

                    I guess the output only becomes an image if you specify `class_labels` in the call to convert(), but you're not really building a classifier so that wouldn't work. So what I had in mind is not actually a solution to your problem.


                    This is why I prefer implementing neural networks with MPS. ;-)

                    • Re: Support for image outputs
                      michael_s Apple Staff Apple Staff (20 points)

                      While the NeuralNetworkBuilder currently does not have options for image outputs, you can use coremtools to modify the resulting model so that the desired multiarray output is treated as an image.


                      Here is an example helper function:

                      def convert_multiarray_output_to_image(spec, feature_name, is_bgr=False):
                          Convert an output multiarray to be represented as an image
                          This will modify the Model_pb spec passed in.
                              model = coremltools.models.MLModel('MyNeuralNetwork.mlmodel')
                              spec = model.get_spec()
                              newModel = coremltools.models.MLModel(spec)
                          spec: Model_pb
                              The specification containing the output feature to convert
                          feature_name: str
                              The name of the multiarray output feature you want to convert
                          is_bgr: boolean
                              If multiarray has 3 channels, set to True for RGB pixel order or false for BGR
                          for output in spec.description.output:
                              if output.name != feature_name:
                              if output.type.WhichOneof('Type') != 'multiArrayType':
                                  raise ValueError("%s is not a multiarray type" % output.name)
                              array_shape = tuple(output.type.multiArrayType.shape)
                              channels, height, width = array_shape
                              from coremltools.proto import FeatureTypes_pb2 as ft
                              if channels == 1:
                                  output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('GRAYSCALE')
                              elif channels == 3:
                                  if is_bgr:
                                      output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('BGR')
                                      output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB')
                                  raise ValueError("Channel Value %d not supported for image inputs" % channels)
                              output.type.imageType.width = width
                              output.type.imageType.height = height


                      Note: Neural Networks can output images from a layer (as CVPixelBuffer), but it clamps the values between 0 and 255. i.e.  values < 0 become 0, values > 255 become 255.


                      You can also just keep the output an MLMultiArray and index into pixels with something like this in swfit:

                          let imageData = UnsafeMutablePointer<Double>(OpaquePointer(imageMultiArray.dataPointer))
                          let channelStride = imageMultiArray.strides[0].intValue;
                          let yStride = imageMultiArray.strides[1].intValue;
                          let xStride = imageMultiArray.strides[2].intValue;
                          func pixelOffset(_ channel: Int, _ y: Int, _ x: Int) -> Int {
                              return channel*channelStride + y*yStride + x*xStride
                          let topLeftGreenPixel = Unit8(imageData[pixelOffset(1,0,0)])
                        • Re: Support for image outputs
                          FrankSchlegel Level 1 Level 1 (20 points)

                          I actually tried pretty much exactly that (tagging the output as image). But my problem is that I can't seem to get the CVPixelBuffer back into some displayable format. I gonna keep on trying.


                          I still don't really understand how CoreML can give me doubles when it's doing its computation on the GPU, though...

                          • Re: Support for image outputs
                            BrianOn99 Level 1 Level 1 (0 points)

                            To use the helper you provided, "convert_multiarray_output_to_image", does the model output dataType need to be DOUBLE or INT32, or both?

                              • Re: Support for image outputs
                                FrankSchlegel Level 1 Level 1 (20 points)

                                There is no requirement for the type, I think. The has to be in the shape (channels, height, width) and have either 1 or 3 channels. I guess the internal conversion step will handle the rest.

                                  • Re: Support for image outputs
                                    BrianOn99 Level 1 Level 1 (0 points)

                                    Thanks a lot.  I will try this out.

                                    • Re: Support for image outputs
                                      BrianOn99 Level 1 Level 1 (0 points)


                                      I have solved the problem by taking another approach.  I take a CVPixelBufferGetBaseAddress(myCVPixelBuffer), and then directly set the alpha chennel to 255 to "remove" the alpha channel.  This approach might be a bit raw as it use UnsafeRawPointer and assume the alpha pixels are located at 3, 7, 11, ... But at least it works.


                                      [Old message]

                                      I think I have got the same problem of getting a "not displayable format", in other word, the image does not show in quick look.  I can already get the image when MLMutiArray is returned, but not when converted CVPixelBuffer in the mlmodel.  Sorry I am quite new to ios development so I don't understand how to use "CGImageAlphaInfo.noneSkipLast".  I guessed that I need to do the conversion cvPixelBuffer -> CIImage -> CGImage, and then create a CGContext with CGImageAlphaInfo.noneSkipLast, then draw the CGImage to the CGContext, finally get a CGImage from the CGContext.


                                      But somehow the image becomes black.  Is the Is this the correct approach?  Could you please share some steps of your approach?


                                      Thanks for your many helps.

                                        • Re: Support for image outputs
                                          FrankSchlegel Level 1 Level 1 (20 points)

                                          For me this works without changing the pixel buffer (output is your CVPixelBuffer):


                                          CVPixelBufferLockBaseAddress(output, .readOnly)
                                          let width = CVPixelBufferGetWidth(output)
                                          let height = CVPixelBufferGetHeight(output)
                                          let data = CVPixelBufferGetBaseAddress(output)!
                                          let outContext = CGContext(data: data, width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(output), space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.noneSkipLast.rawValue)!
                                          let outImage = outContext.makeImage()!
                                          CVPixelBufferUnlockBaseAddress(output, .readOnly)
                                    • Re: Support for image outputs
                                      lozanoleonardo Level 1 Level 1 (0 points)

                                      Hey Michael, can I use this script to conver my MLMultiArray input to be an Image instead? I'm guessing that it needs to consider


                                      in this case.

                                        • Re: Support for image outputs
                                          BrianOn99 Level 1 Level 1 (0 points)

                                          Just for your reference, if you network orginal output shape is [3, 511, 511] then after conversion to CVPixelBuffer as output, the diff is only:


                                          --- oil_mlmodel_yespost.pb 2017-07-10 11:00:21.078301960 +0800
                                          +++ oil_mlmodel_yespost_cvbuffer.pb 2017-07-10 10:59:38.374233180 +0800
                                          @@ -13,11 +13,10 @@
                                             output {
                                               name: "output"
                                                 type {
                                          -        multiArrayType {
                                          -          shape: 3
                                          -          shape: 511
                                          -          shape: 511
                                          -          dataType: DOUBLE
                                          +        imageType {
                                          +          width: 511
                                          +          height: 511
                                          +          colorSpace: RGB



                                          And in my case I just need to run convert_multiarray_output_to_image(tm, 'output'), where tm is my model.  I don't need to specify the input.

                                        • Re: Support for image outputs
                                          fakrueger Level 1 Level 1 (0 points)

                                          Thanks for those details, it really helped! One question though:


                                          How would you apply scale and biases to the image data before conversion? For instance, my network outputs in the range [-1, 1] and I need to convert that to [0, 255].


                                          When using the Keras converter, the input image can thankfully be scaled and biased. Can that code be reused somehow?

                                            • Re: Support for image outputs
                                              BrianOn99 Level 1 Level 1 (0 points)

                                              I have tackled a similar problem (subtract VGG constants) of post processing by manually inserting some 1x1 convolution.  For your particular problem, you may try adding a 1x1x3x3 (1by1 kernel, 3 for image channel) conv layer of weight

                                              [127.5, 0, 0

                                                0, 127.5, 0

                                                0, 0, 127.5]

                                              and bias

                                              [127.5, 127.5, 127.5]


                                              Place it after you model's final layer.


                                              This operation will scale each channel seperately into [-127.5, 127.5] and then add 127.5 into each channel.  I have not tried it, just to give you a direction to work on.


                                              As an aside, resetting alpha channel is no longer required at xcode beta 5.

                                                • Re: Support for image outputs
                                                  FrankSchlegel Level 1 Level 1 (20 points)

                                                  There's actually a bias layer in the spec that does exactly that. No need for the work-around. Here is my helper for the NeuralNetworkBuilder:


                                                  def _add_bias(self, name, bias, shape, input_name, output_name):
                                                    Add bias layer to the model.
                                                    name: str
                                                    The name of this layer.
                                                    bias: numpy.array
                                                    The biases to apply to the inputs.
                                                    The size must be equal to the product of the ``shape`` dimensions.
                                                    shape: tuple
                                                    The shape of the bias. Must be one of the following:
                                                    ``[1]``, ``[C]``, ``[1, H, W]`` or ``[C, H, W]``.
                                                    input_name: str
                                                    The input blob name of this layer.
                                                    output_name: str
                                                    The output blob name of this layer.
                                                    nn_spec = self.nn_spec
                                                    # Add a new bias layer
                                                    spec_layer = nn_spec.layers.add()
                                                    spec_layer.name = name
                                                    spec_layer_params = spec_layer.bias
                                                    spec_layer_params.bias.floatValue.extend(map(float, bias.flatten()))
                                                  _NeuralNetworkBuilder.add_bias = _add_bias