How to implement Image Captioning in Core ML and where to find Image Captioning .mlmodel for iOS?

Question

For my iOS app, I need an Image Captioning model (.mlmodel file) that will return text information about what is shown on the input image or maybe a list of words or tags describing what is shown on the input image.

.mlmodel

should take an image as an input and return text.

I know how to get the dominant objects on the image using GoogLeNetPlaces, MobileNet, SqueezeNet models.

this example shows Image Detection (Not Image Captioning).

During my research, I've found these Image Captioning solutions and articles, but none of them provides a

.mlmodel

to work with to achieve Image Captioning.

Check these examples:

I have not found any working or existing

.mlmodel

that allows me to do Image Captioning in iOS.

I know that Caffee and Keras models can be converted to mlmodel but: I didn't find any model that allows doing what I need.

Image Captioning Examples:

Need functionality similar to this

I would appreciate any answers, links and help that can help to achieve Image Captioning in iOS.

Core ML

2.0k

Posted by

Adelmaer

Reply

Add a Comment

Answer 1

This examples work:

The Fritz pods dependencies should be updated:

pod "Fritz", "7.0.1"
pod "Fritz/VisionPoseModel/Human/Fast", "7.0.1"

More information here: https://github.com/fritzlabs/fritz-ai-ios-sdk

But there is some issues with accuracy.

Posted by

mad-d

Add a Comment

How to implement Image Captioning in Core ML and where to find Image Captioning .mlmodel for iOS?

Replies