Implementation of Neural Image Captioning model using Keras with Theano backend
This repository contains an implementation of image captioning based on neural network (i.e. CNN + RNN). The model first extracts the image feature by CNN and then generates captions by RNN. CNN is VGG16 and RNN is a standard LSTM .
Normal Sampling and Beam Search were used to predict the caption of images.
Dataset used was Flickr8k dataset.
[1] Deep Visual-Semantic Alignments for Generating Image
Descriptions ( Karpathy et-al, CVPR 2015)
[2] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan Show and Tell: A Neural Image Caption Generator
[3] CS231n: Convolutional Neural Networks for Visual Recognition.
( Instructors : Li Fei Fei, Andrej Karpathy, Justin Johnson)