How to implement Image Captioning in Core ML and where to find Image Captioning .mlmodel for iOS?

For my iOS app, I need an Image Captioning model (.mlmodel file) that will return text information about what is shown on the input image or maybe a list of words or tags describing what is shown on the input image.


.mlmodel
should take an image as an input and return text.


I know how to get the dominant objects on the image using GoogLeNetPlaces, MobileNet, SqueezeNet models.

this example shows Image Detection (Not Image Captioning).


During my research, I've found these Image Captioning solutions and articles, but none of them provides a

.mlmodel
to work with to achieve Image Captioning.

Check these examples:

I have not found any working or existing

.mlmodel
that allows me to do Image Captioning in iOS.

I know that Caffee and Keras models can be converted to mlmodel but: I didn't find any model that allows doing what I need.

Image Captioning Examples:

Need functionality similar to this

I would appreciate any answers, links and help that can help to achieve Image Captioning in iOS.

Replies

This examples work:

The Fritz pods dependencies should be updated:

pod "Fritz", "7.0.1"
pod "Fritz/VisionPoseModel/Human/Fast", "7.0.1"

More information here: https://github.com/fritzlabs/fritz-ai-ios-sdk

But there is some issues with accuracy.