Video_Captioning with attention in Tensorflow

S2VT-seq2seq-video-captioning-attention

Model Architecture: S2VT

S2VT seq2seq model is used in the task

Download MSVD dataset : 1450 videos for training and 100 videos for testing
Link: https://drive.google.com/file/d/0B18IKlS3niGFNlBoaHJTY3NXUkE/view (provided by MLDS2017) and put MSVD dataset under video-captioning folder
Create a .txt file and name it “testing_output.txt”. (Actually we have peer review section for MLDS2017, so if you want to do peer review in class, just create another .txt and name “sample_output_peer_review.txt”)

Run the shell script

./s2vt_predict.sh [data dir] [output filename]

[data dir] should be “./MLDS_hw2_data” (dataset under main folder), [output filename] should be “./testing_output.txt”

Usage for training: modify mode = 0 (line 230 in s2vt_predict.py)

Generated Caption: a man is dancing.