For my iOS app, I need an Image Captioning model (.mlmodel file) that will return text information about what is shown on the input image or maybe a list of words or tags describing what is shown on the input image.
.mlmodel
should take an image as an input and return text.I know how to get the dominant objects on the image using GoogLeNetPlaces, MobileNet, SqueezeNet models.
this example shows Image Detection (Not Image Captioning).
During my research, I've found these Image Captioning solutions and articles, but none of them provides a
.mlmodel
to work with to achieve Image Captioning.Check these examples:
- Show and Tell: A Neural Image Caption Generator
- A neural image contextualised caption generator based on CoreML
- Neuraltalk 2, Image Captioning Model, in PyTorch
- Tensorflow implementation of Show, Attend and Tell
- Image-Captioning using InceptionV3 and Beam Search
- There is also a hashtag prediction git here
I have not found any working or existing
.mlmodel
that allows me to do Image Captioning in iOS.I know that Caffee and Keras models can be converted to mlmodel but: I didn't find any model that allows doing what I need.
Image Captioning Examples:
Need functionality similar to this
I would appreciate any answers, links and help that can help to achieve Image Captioning in iOS.