32 Replies
      Latest reply: Nov 5, 2017 1:40 PM by OliDem RSS
      FrankSchlegel Level 2 Level 2 (50 points)

        I'm currently porting image style transfer neural networks to CoreML and it works great so far. The only downside is that the only output format seems to be a MLMultiArray, which I have to (slowly) convert back into an image.

         

        Is there any chance we can get support for image outputs in the future? Or is there a way I can use the output data in Metal so I can do the conversion on the GPU myself?

         

        Anyways, thanks for CoreML! It's great so far and I can't wait to see what's coming in the future.

        • Re: Support for image outputs
          FrankSchlegel Level 2 Level 2 (50 points)

          I just tried to get the result into a Metal buffer to convert it in a compute shader, but I'm running into the issue that Metal doesn't support doubles. And CoreML seem to always return doubles in the output (even if I tell it to use FLOAT32 in the spec).

          How does CoreML actually do it under the hood when the GPU doesn't support double precision?

          • Re: Support for image outputs
            kerfuffle Level 2 Level 2 (90 points)

            If your model outputs an image (i.e. something with width, height, and a depth of 3 or 4 channels), then Core ML can interpret that as an image. You need to pass a parameter for this in the coremltools conversion script, so that Core ML knows this output should be interpreted as an image.

              • Re: Support for image outputs
                FrankSchlegel Level 2 Level 2 (50 points)

                How do I do that? The NeuralNetworkBilder has only a method for pre-processing image inputs, but not for post-processing outputs. If I try to convert the type of the output directly in the spec, the model compiles (and Xcode shows the format correctly), but the result is wrong.

                  • Re: Support for image outputs
                    kerfuffle Level 2 Level 2 (90 points)

                    I guess the output only becomes an image if you specify `class_labels` in the call to convert(), but you're not really building a classifier so that wouldn't work. So what I had in mind is not actually a solution to your problem.

                     

                    This is why I prefer implementing neural networks with MPS. ;-)

                    • Re: Support for image outputs
                      michael_s Apple Staff Apple Staff (20 points)

                      While the NeuralNetworkBuilder currently does not have options for image outputs, you can use coremtools to modify the resulting model so that the desired multiarray output is treated as an image.

                       

                      Here is an example helper function:

                      
                      def convert_multiarray_output_to_image(spec, feature_name, is_bgr=False):
                          """
                          Convert an output multiarray to be represented as an image
                          This will modify the Model_pb spec passed in.
                          Example:
                              model = coremltools.models.MLModel('MyNeuralNetwork.mlmodel')
                              spec = model.get_spec()
                              convert_multiarray_output_to_image(spec,'imageOutput',is_bgr=False)
                              newModel = coremltools.models.MLModel(spec)
                              newModel.save('MyNeuralNetworkWithImageOutput.mlmodel')
                          Parameters
                          ----------
                          spec: Model_pb
                              The specification containing the output feature to convert
                          feature_name: str
                              The name of the multiarray output feature you want to convert
                          is_bgr: boolean
                              If multiarray has 3 channels, set to True for RGB pixel order or false for BGR
                          """
                          for output in spec.description.output:
                              if output.name != feature_name:
                                  continue
                              if output.type.WhichOneof('Type') != 'multiArrayType':
                                  raise ValueError("%s is not a multiarray type" % output.name)
                              array_shape = tuple(output.type.multiArrayType.shape)
                              channels, height, width = array_shape
                              from coremltools.proto import FeatureTypes_pb2 as ft
                              if channels == 1:
                                  output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('GRAYSCALE')
                              elif channels == 3:
                                  if is_bgr:
                                      output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('BGR')
                                  else:
                                      output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB')
                              else:
                                  raise ValueError("Channel Value %d not supported for image inputs" % channels)
                              output.type.imageType.width = width
                              output.type.imageType.height = height
                      
                      

                       

                      Note: Neural Networks can output images from a layer (as CVPixelBuffer), but it clamps the values between 0 and 255. i.e.  values < 0 become 0, values > 255 become 255.

                       

                      You can also just keep the output an MLMultiArray and index into pixels with something like this in swfit:

                          let imageData = UnsafeMutablePointer<Double>(OpaquePointer(imageMultiArray.dataPointer))
                      
                          let channelStride = imageMultiArray.strides[0].intValue;
                          let yStride = imageMultiArray.strides[1].intValue;
                          let xStride = imageMultiArray.strides[2].intValue;
                      
                          func pixelOffset(_ channel: Int, _ y: Int, _ x: Int) -> Int {
                              return channel*channelStride + y*yStride + x*xStride
                          }
                      
                          let topLeftGreenPixel = Unit8(imageData[pixelOffset(1,0,0)])
                      
                        • Re: Support for image outputs
                          FrankSchlegel Level 2 Level 2 (50 points)

                          I actually tried pretty much exactly that (tagging the output as image). But my problem is that I can't seem to get the CVPixelBuffer back into some displayable format. I gonna keep on trying.

                           

                          I still don't really understand how CoreML can give me doubles when it's doing its computation on the GPU, though...

                          • Re: Support for image outputs
                            BrianOn99 Level 1 Level 1 (0 points)

                            To use the helper you provided, "convert_multiarray_output_to_image", does the model output dataType need to be DOUBLE or INT32, or both?

                              • Re: Support for image outputs
                                FrankSchlegel Level 2 Level 2 (50 points)

                                There is no requirement for the type, I think. The has to be in the shape (channels, height, width) and have either 1 or 3 channels. I guess the internal conversion step will handle the rest.

                                  • Re: Support for image outputs
                                    BrianOn99 Level 1 Level 1 (0 points)

                                    Thanks a lot.  I will try this out.

                                    • Re: Support for image outputs
                                      BrianOn99 Level 1 Level 1 (0 points)

                                      Edit:

                                      I have solved the problem by taking another approach.  I take a CVPixelBufferGetBaseAddress(myCVPixelBuffer), and then directly set the alpha chennel to 255 to "remove" the alpha channel.  This approach might be a bit raw as it use UnsafeRawPointer and assume the alpha pixels are located at 3, 7, 11, ... But at least it works.

                                       

                                      [Old message]

                                      I think I have got the same problem of getting a "not displayable format", in other word, the image does not show in quick look.  I can already get the image when MLMutiArray is returned, but not when converted CVPixelBuffer in the mlmodel.  Sorry I am quite new to ios development so I don't understand how to use "CGImageAlphaInfo.noneSkipLast".  I guessed that I need to do the conversion cvPixelBuffer -> CIImage -> CGImage, and then create a CGContext with CGImageAlphaInfo.noneSkipLast, then draw the CGImage to the CGContext, finally get a CGImage from the CGContext.

                                       

                                      But somehow the image becomes black.  Is the Is this the correct approach?  Could you please share some steps of your approach?

                                       

                                      Thanks for your many helps.

                                        • Re: Support for image outputs
                                          FrankSchlegel Level 2 Level 2 (50 points)

                                          For me this works without changing the pixel buffer (output is your CVPixelBuffer):

                                           

                                          CVPixelBufferLockBaseAddress(output, .readOnly)
                                          
                                          let width = CVPixelBufferGetWidth(output)
                                          let height = CVPixelBufferGetHeight(output)
                                          let data = CVPixelBufferGetBaseAddress(output)!
                                          
                                          let outContext = CGContext(data: data, width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(output), space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.noneSkipLast.rawValue)!
                                          let outImage = outContext.makeImage()!
                                          
                                          CVPixelBufferUnlockBaseAddress(output, .readOnly)
                                          
                                    • Re: Support for image outputs
                                      lozanoleonardo Level 1 Level 1 (0 points)

                                      Hey Michael, can I use this script to conver my MLMultiArray input to be an Image instead? I'm guessing that it needs to consider

                                      spec.description.input
                                      

                                      in this case.

                                        • Re: Support for image outputs
                                          BrianOn99 Level 1 Level 1 (0 points)

                                          Just for your reference, if you network orginal output shape is [3, 511, 511] then after conversion to CVPixelBuffer as output, the diff is only:

                                           

                                          --- oil_mlmodel_yespost.pb 2017-07-10 11:00:21.078301960 +0800
                                          +++ oil_mlmodel_yespost_cvbuffer.pb 2017-07-10 10:59:38.374233180 +0800
                                          @@ -13,11 +13,10 @@
                                             output {
                                               name: "output"
                                                 type {
                                          -        multiArrayType {
                                          -          shape: 3
                                          -          shape: 511
                                          -          shape: 511
                                          -          dataType: DOUBLE
                                          +        imageType {
                                          +          width: 511
                                          +          height: 511
                                          +          colorSpace: RGB
                                                   }
                                                }
                                             }
                                          
                                          

                                           

                                           

                                          And in my case I just need to run convert_multiarray_output_to_image(tm, 'output'), where tm is my model.  I don't need to specify the input.

                                        • Re: Support for image outputs
                                          fakrueger Level 1 Level 1 (0 points)

                                          Thanks for those details, it really helped! One question though:

                                           

                                          How would you apply scale and biases to the image data before conversion? For instance, my network outputs in the range [-1, 1] and I need to convert that to [0, 255].

                                           

                                          When using the Keras converter, the input image can thankfully be scaled and biased. Can that code be reused somehow?

                                            • Re: Support for image outputs
                                              BrianOn99 Level 1 Level 1 (0 points)

                                              I have tackled a similar problem (subtract VGG constants) of post processing by manually inserting some 1x1 convolution.  For your particular problem, you may try adding a 1x1x3x3 (1by1 kernel, 3 for image channel) conv layer of weight

                                              [127.5, 0, 0

                                                0, 127.5, 0

                                                0, 0, 127.5]

                                              and bias

                                              [127.5, 127.5, 127.5]

                                               

                                              Place it after you model's final layer.

                                               

                                              This operation will scale each channel seperately into [-127.5, 127.5] and then add 127.5 into each channel.  I have not tried it, just to give you a direction to work on.

                                               

                                              As an aside, resetting alpha channel is no longer required at xcode beta 5.

                                                • Re: Support for image outputs
                                                  FrankSchlegel Level 2 Level 2 (50 points)

                                                  There's actually a bias layer in the spec that does exactly that. No need for the work-around. Here is my helper for the NeuralNetworkBuilder:

                                                   

                                                  def _add_bias(self, name, bias, shape, input_name, output_name):
                                                    """
                                                    Add bias layer to the model.
                                                  
                                                    Parameters
                                                    ----------
                                                    name: str
                                                    The name of this layer.
                                                    bias: numpy.array
                                                    The biases to apply to the inputs.
                                                    The size must be equal to the product of the ``shape`` dimensions.
                                                    shape: tuple
                                                    The shape of the bias. Must be one of the following:
                                                    ``[1]``, ``[C]``, ``[1, H, W]`` or ``[C, H, W]``.
                                                    input_name: str
                                                    The input blob name of this layer.
                                                    output_name: str
                                                    The output blob name of this layer.
                                                    """
                                                  
                                                    nn_spec = self.nn_spec
                                                    # Add a new bias layer
                                                    spec_layer = nn_spec.layers.add()
                                                    spec_layer.name = name
                                                    spec_layer.input.append(input_name)
                                                    spec_layer.output.append(output_name)
                                                    spec_layer_params = spec_layer.bias
                                                    spec_layer_params.bias.floatValue.extend(map(float, bias.flatten()))
                                                    spec_layer_params.shape.extend(shape)
                                                  
                                                  _NeuralNetworkBuilder.add_bias = _add_bias
                                                  
                                              • Re: Support for image outputs
                                                ktak199 Level 1 Level 1 (0 points)

                                                I'm trying to convert outputs of mlmodel to UIImage, but it's not working...

                                                outputs of mlmodel : Image(Grayscale width x height)

                                                 

                                                guard let results = request.results as? [VNPixelBufferObservation]  else{  fatalError("Fatal error")}
                                                print(String(describing: type(of: results))
                                                print(String(describing: type(of: results[0])))
                                                let ciImage = CIImage(cvPixelBuffer: results[0].pixelBuffer)
                                                
                                                

                                                 

                                                Outputs:

                                                Array<VNPixelBufferObservation>

                                                VNPixelBufferObservation

                                                 

                                                Error occurs on line 04 :

                                                Thread 1: EXC_BAD_ACCESS (code=1, address=0xe136dbec8)

                                                 

                                                -------------------------------------------------------------------------------------------

                                                I'm trying to keep mlmodel MultiArray.

                                                 

                                                outputs of mlmodel : MultiArray (Double 1 x width x height)

                                                 

                                                guard let results = request.results as? [VNCoreMLFeatureValueObservation]  else{  fatalError("Fatal error")}
                                                
                                                print(String(describing: type(of: results))
                                                print(String(describing: type(of: results[0]))
                                                print(String(describing: type(of: results[0].featureValue)))
                                                print(results[0].featureValue)
                                                print(results[0].featureValue.multiArrayValue)
                                                
                                                let imageMultiArray = results.[0].featureValue.multiArrayValue
                                                let imageData = UnsafeMutablePointer<Double>(OpaquePointer(imageMultiArray?.dataPointer))
                                                let channelStride = imageMultiArray?.strides[0].intValue;
                                                let yStride = imageMultiArray?.strides[1].intValue;
                                                let xStride = imageMultiArray?.strides[2].intValue;
                                                
                                                func pixelOffset(_ channel: Int, _ y: Int, _ x: Int) -> Int {
                                                  return channel*channelStride! + y*yStride! + x*xStride!
                                                }
                                                
                                                let topLeftGreenPixel = Unit8(imageData![pixelOffset(1,0,0)])
                                                
                                                

                                                 

                                                Outputs:

                                                Array<VNCoreMLFeatureValueObservation>

                                                VNCoreMLFeatureValueObservation

                                                Optional<MLFeatureValue>

                                                Optional(MultiArray : Double 1 x width x height array)

                                                Optional(Double 1 x width x height array)

                                                 

                                                Error occurs on line 19 :Use of unresolved identifier 'Unit8

                                                shoud be replaced by Uint8 ? and how to convert to UIImage?

                                                 

                                                 

                                                Thank you for your any help in advance!

                                                  • Re: Support for image outputs
                                                    FrankSchlegel Level 2 Level 2 (50 points)

                                                    Yes, Unit8 is a typo and should be Uint8.

                                                     

                                                    But regardless, you should really try to adjust the spec of your model to produce image outputs (instead of an multi-array). This way you should be able to get the pixel buffer directly from the prediction. In the answer above you can see the Python function that can do that for you.

                                                      • Re: Support for image outputs
                                                        ktak199 Level 1 Level 1 (0 points)

                                                        Thank you for your response.

                                                         

                                                        The output of my mlmodel is already image outputs (Grayscale width x height).


                                                        Error does not occur in getting outputs on line1.

                                                        But error occurs when I try to access to pixelBuffer on line 04

                                                        Error message: Thread 1: EXC_BAD_ACCESS (code=1, address=0xe136dbec8)

                                                        guard let results = request.results as? [VNPixelBufferObservation]  else{  fatalError("Fatal error")}
                                                        print(String(describing: type(of: results)) //->Array<VNPixelBufferObservation>
                                                        print(String(describing: type(of: results[0]))) //->VNPixelBufferObservation
                                                        let ciImage = CIImage(cvPixelBuffer: results[0].pixelBuffer)
                                                        
                                                        

                                                         

                                                        Thank you.

                                                          • Re: Support for image outputs
                                                            FrankSchlegel Level 2 Level 2 (50 points)

                                                            Ok, that's indeed strange. Do you get any console output?

                                                             

                                                            Also maybe try to use your model with CoreML directly, without Vision. Do you get the same error there?

                                                              • Re: Support for image outputs
                                                                ktak199 Level 1 Level 1 (0 points)
                                                                print(String(describing: type(of: results))
                                                                print(String(describing: type(of: results[0])))
                                                                let ciImage = CIImage(cvPixelBuffer: results[0].pixelBuffer)
                                                                

                                                                Output is

                                                                Array<VNPixelBufferObservation>

                                                                VNPixelBufferObservation

                                                                Thread 1: EXC_BAD_ACCESS (code=1, address=0xe136dbec8)

                                                                 

                                                                How to use CoreML directly, without Vision? Any URL?

                                                                  • Re: Support for image outputs
                                                                    FrankSchlegel Level 2 Level 2 (50 points)

                                                                    It's a bit tedious because you have to do the pixel buffer conversion yourself. Check out the repo hollance/CoreMLHelpers on Github so see how it's done (sorry for no link, but wanted to avoid waiting for moderation).

                                                                      • Re: Support for image outputs
                                                                        ktak199 Level 1 Level 1 (0 points)

                                                                        It worked! Thank you.

                                                                          • Re: Support for image outputs
                                                                            FrankSchlegel Level 2 Level 2 (50 points)

                                                                            Glad to hear that!

                                                                             

                                                                            It's a bit curious that it's not working with Vision, though. Maybe it's because your output is grey-scale and Vision expects RGB?

                                                                              • Re: Support for image outputs
                                                                                OliDem Level 1 Level 1 (0 points)

                                                                                Dear Frank and ktak199, I think I am facing the same issue.

                                                                                I converted my torch-model for style transfer to coreml using the torch2coreml converter.

                                                                                 

                                                                                As soon as I access the pixelBuffer property of the VNPixelBufferObservation in the completionHandler of the VNCoreMLRequest, the program crashes with EXC_BAD_ACCESS.

                                                                                Can you confirm that this problem is caused by using the vision framework, and not by the model conversion procedure?

                                                                                So, if I use „plain“ CoreML, chances are high that the model will work?

                                                                                Thanks a lot in advance

                                                                                Oliver