项目作者: Kyung-Min

项目描述 :
https://arxiv.org/abs/1707.00836
高级语言: Jupyter Notebook
项目地址: git://github.com/Kyung-Min/Deep-Embedded-Memory-Networks.git
创建时间: 2017-07-21T06:10:48Z
项目社区:https://github.com/Kyung-Min/Deep-Embedded-Memory-Networks

开源协议:

下载


Deep-Embedded-Memory-Networks (Keras version)

Authors: Kyung-Min Kim, Min-Oh Heo, Seong-Ho Choi, and Byoung-Tak Zhang (Seoul National University & Surromind Robotics)
Paper: DeepStory: Video Story QA by Deep Embedded Memory Networks (https://arxiv.org/abs/1707.00836) (IJCAI 2017)

This notebook shows how the DEMN works. The DEMN consists of three modules (video story understanding, story selection, answer selection). This code corresponds to QA modules (story selection, answer selection) among them.

PororoQA dataset release: https://github.com/Kyung-Min/PororoQA

  1. from __future__ import print_function
  2. from __future__ import division
  3. import numpy as np
  4. import sys
  5. import utils
  6. import keras.activations as activations
  7. from keras.models import Model
  8. from keras.regularizers import l2
  9. from keras.callbacks import EarlyStopping, ModelCheckpoint
  10. from keras.layers import Input, TimeDistributed
  11. from keras.layers.merge import concatenate, add, multiply
  12. from keras.layers.embeddings import Embedding
  13. from keras.layers.convolutional import Convolution1D, MaxPooling1D
  14. from keras.layers.core import Activation, Dense, Dropout, Flatten, Lambda, Permute, RepeatVector
  15. from keras.layers.recurrent import GRU, LSTM
  16. from keras import backend as K
  17. import csv
  18. def config():
  19. c = dict()
  20. # embedding params
  21. c['emb'] = 'Glove'
  22. c['embdim'] = 300
  23. c['inp_e_dropout'] = 1/2
  24. # objective function
  25. c['loss'] = 'ranking_loss'
  26. c['margin'] = 1
  27. # training hyperparams
  28. c['opt'] = 'adam'
  29. c['batch_size'] = 160
  30. c['epochs'] = 16
  31. # sentences with word lengths below the 'pad' will be padded with 0.
  32. c['pad'] = 60
  33. # scoring function: word-level attention-based model
  34. c['dropout'] = 1/2
  35. c['dropoutfix_inp'] = 0
  36. c['dropoutfix_rec'] = 0
  37. c['l2reg'] = 1e-4
  38. c['rnnbidi'] = True
  39. c['rnn'] = GRU
  40. c['rnnbidi_mode'] = add
  41. c['rnnact'] = 'tanh'
  42. c['rnninit'] = 'glorot_uniform'
  43. c['sdim'] = 1
  44. c['pool_layer'] = MaxPooling1D
  45. c['cnnact'] = 'tanh'
  46. c['cnninit'] = 'glorot_uniform'
  47. c['cdim'] = 2
  48. c['cfiltlen'] = 3
  49. c['adim'] = 1/2
  50. # mlp scoring function
  51. c['Ddim'] = 2
  52. ps, h = utils.hash_params(c)
  53. return c, ps, h
  54. conf = None
  55. emb = None
  56. vocab = None
  57. inp_tr = None
  58. inp_val = None
  59. inp_test = None
  60. y_val = None
  61. y_test = None

Data Load

The data provided contain the output from the video story understanding module, i.e. reconstructed story sentence first eq, where

second eq

  • third eq is the description for the i-th video scene, which is retrieved by the video story understanding module
  • fourth eq is the subtitle of the i-th video scene
  • || is concatenation

For example, first eq can be ‘there are three friends on the ground. the friends are talking about the new house.’

  1. '''
  2. The format of the dataset is as follows.
  3. Training dataset:
  4. question1, positive story sentence, negative story sentence1, positive answer sentence, negative answer sentence1
  5. ... , negative answer sentence2
  6. ... , negative answer sentence3
  7. ... , negative answer sentence4
  8. question1, positive story sentence, negative story sentence2, positive answer sentence, negative answer sentence1
  9. ... , negative answer sentence2
  10. ... , negative answer sentence3
  11. ... , negative answer sentence4
  12. ...
  13. question2, positive story sentence, negative story sentence1, positive answer sentence, negative answer sentence1
  14. ... , negative answer sentence2
  15. ... , negative answer sentence3
  16. ... , negative answer sentence4
  17. question2, positive story sentence, negative story sentence2, positive answer sentence, negative answer sentence1
  18. ... , negative answer sentence2
  19. ... , negative answer sentence3
  20. ... , negative answer sentence4
  21. Validation & test dataset:
  22. question1, label for story sentence, story sentence, dummy, positive answer sentence, negative answer sentence1
  23. ... , negative answer sentence2
  24. ... , negative answer sentence3
  25. ... , negative answer sentence4
  26. question1, label for story sentence, story sentence, dummy, positive answer sentence, negative answer sentence1
  27. ... , negative answer sentence2
  28. ... , negative answer sentence3
  29. ... , negative answer sentence4
  30. ...
  31. question2, label for story sentence, story sentence, dummy, positive answer sentence, negative answer sentence1
  32. ... , negative answer sentence2
  33. ... , negative answer sentence3
  34. ... , negative answer sentence4
  35. question2, label for story sentence, story sentence, dummy, positive answer sentence, negative answer sentence1
  36. ... , negative answer sentence2
  37. ... , negative answer sentence3
  38. ... , negative answer sentence4
  39. '''
  40. def load_data_from_file(dsfile, iseval):
  41. #load a dataset in the csv format;
  42. q = [] # a set of questions
  43. s_p = [] # if training time, s1 is a set of positive sentences. Otherwise, s1 is a set of sentences.
  44. s_n = [] # if training time, s2 is a set of negative sentences. Otherwise, s2 is a set of dummy sentences.
  45. q_sp = [] # a set of sentences which concatenate questions and positive sentences
  46. a_p = [] # a set of positive answers
  47. a_n = [] # a set of negative answers
  48. labels = []
  49. with open(dsfile) as f:
  50. c = csv.DictReader(f)
  51. for l in c:
  52. if iseval:
  53. label = int(l['label'])
  54. labels.append(label)
  55. try:
  56. qtext = l['qtext'].decode('utf8')
  57. s_p_text = l['atext1'].decode('utf8')
  58. s_n_text = l['atext2'].decode('utf8')
  59. except AttributeError: # python3 has no .decode()
  60. qtext = l['qtext']
  61. s_p_text = l['atext1']
  62. s_n_text = l['atext2']
  63. a_p_text = l['a1'].decode('utf8')
  64. a_n_text = l['a2'].decode('utf8')
  65. a_p.append(a_p_text.split(' '))
  66. a_n.append(a_n_text.split(' '))
  67. q.append(qtext.split(' '))
  68. s_p.append(s_p_text.split(' '))
  69. s_n.append(s_n_text.split(' '))
  70. q_sp.append(qtext.split(' ')+s_p_text.split(' '))
  71. if iseval:
  72. return (q, s_p, s_n, q_sp, a_p, a_n, np.array(labels))
  73. else:
  74. return (q, s_p, s_n, q_sp, a_p, a_n)
  75. def make_model_inputs(qi, si_p, si_n, qi_si, ai_p, ai_n, f01, f10, f02, f20, f31, f13, f32, f23,
  76. q, s_p, s_n, q_sp, a_p, a_n, y=None):
  77. inp = {'qi': qi, 'si_p': si_p, 'si_n': si_n, 'qi_si':qi_si, 'ai_p':ai_p,
  78. 'ai_n':ai_n, 'f01':f01, 'f10':f10, 'f02':f02, 'f20':f20, 'f31':f31,
  79. 'f13':f13, 'f32':f32, 'f23':f23, 'q':q, 's_p':s_n, 's_n':s_n, 'q_sp':q_sp, 'a_p':a_p, 'a_n':a_n}
  80. if y is not None:
  81. inp['y'] = y
  82. return inp
  83. def load_set(fname, vocab=None, iseval=False):
  84. if iseval:
  85. q, s_p, s_n, q_sp, a_p, a_n, y = load_data_from_file(fname, iseval)
  86. else:
  87. q, s_p, s_n, q_sp, a_p, a_n = load_data_from_file(fname, iseval)
  88. vocab = utils.Vocabulary(q + s_p + s_n + a_p + a_n)
  89. pad = conf['pad']
  90. qi = vocab.vectorize(q, pad=pad)
  91. si_p = vocab.vectorize(s_p, pad=pad)
  92. si_n = vocab.vectorize(s_n, pad=pad)
  93. qi_si = vocab.vectorize(q_sp, pad=pad)
  94. ai_p = vocab.vectorize(a_p, pad=pad)
  95. ai_n = vocab.vectorize(a_n, pad=pad)
  96. f01, f10 = utils.sentence_flags(q, s_p, pad)
  97. f02, f20 = utils.sentence_flags(q, s_n, pad)
  98. f31, f13 = utils.sentence_flags(q_sp, a_p, pad)
  99. f32, f23 = utils.sentence_flags(q_sp, a_n, pad)
  100. if iseval:
  101. inp = make_model_inputs(qi, si_p, si_n, qi_si, ai_p, ai_n, f01, f10, f02, f20,
  102. f31, f13, f32, f23, q, s_p, s_n, q_sp, a_p, a_n, y=y)
  103. return (inp, y)
  104. else:
  105. inp = make_model_inputs(qi, si_p, si_n, qi_si, ai_p, ai_n, f01, f10, f02, f20,
  106. f31, f13, f32, f23, q, s_p, s_n, q_sp, a_p, a_n)
  107. return (inp, vocab)
  108. def load_data(trainf, valf, testf):
  109. global vocab, inp_tr, inp_val, inp_test, y_val, y_test
  110. inp_tr, vocab = load_set(trainf, iseval=False)
  111. inp_val, y_val = load_set(valf, vocab=vocab, iseval=True)
  112. inp_test, y_test = load_set(testf, vocab=vocab, iseval=True)
  113. def embedding():
  114. '''
  115. Declare all inputs (vectorized sentences and NLP flags)
  116. and generate outputs representing vector sequences with dropout applied.
  117. Returns the vector dimensionality.
  118. '''
  119. pad = conf['pad']
  120. dropout = conf['inp_e_dropout']
  121. # story selection
  122. input_qi = Input(name='qi', shape=(pad,), dtype='int32')
  123. input_si_p = Input(name='si_p', shape=(pad,), dtype='int32')
  124. input_f01 = Input(name='f01', shape=(pad, utils.flagsdim))
  125. input_f10 = Input(name='f10', shape=(pad, utils.flagsdim))
  126. input_si_n = Input(name='si_n', shape=(pad,), dtype='int32')
  127. input_f02 = Input(name='f02', shape=(pad, utils.flagsdim))
  128. input_f20 = Input(name='f20', shape=(pad, utils.flagsdim))
  129. # answer selection
  130. input_qi_si = Input(name='qi_si', shape=(pad,), dtype='int32')
  131. input_ai_p = Input(name='ai_p', shape=(pad,), dtype='int32')
  132. input_f31 = Input(name='f31', shape=(pad, utils.flagsdim))
  133. input_f13 = Input(name='f13', shape=(pad, utils.flagsdim))
  134. input_ai_n = Input(name='ai_n', shape=(pad,), dtype='int32')
  135. input_f32 = Input(name='f32', shape=(pad, utils.flagsdim))
  136. input_f23 = Input(name='f23', shape=(pad, utils.flagsdim))
  137. input_nodes = [input_qi, input_si_p, input_f01, input_f10, input_si_n,
  138. input_f02, input_f20, input_qi_si, input_ai_p, input_f31, input_f13,
  139. input_ai_n, input_f32, input_f23]
  140. N = emb.N + utils.flagsdim
  141. shared_embedding = Embedding(name='emb', input_dim=vocab.size(), input_length=pad,
  142. output_dim=emb.N, mask_zero=False,
  143. weights=[vocab.embmatrix(emb)], trainable=True)
  144. emb_qi_p = Dropout(dropout, noise_shape=(N,))(concatenate([shared_embedding(input_qi),
  145. input_f01]))
  146. emb_si_p = Dropout(dropout, noise_shape=(N,))(concatenate([shared_embedding(input_si_p),
  147. input_f10]))
  148. emb_qi_n = Dropout(dropout, noise_shape=(N,))(concatenate([shared_embedding(input_qi),
  149. input_f02]))
  150. emb_si_n = Dropout(dropout, noise_shape=(N,))(concatenate([shared_embedding(input_si_n),
  151. input_f20]))
  152. emb_qi_si_p = Dropout(dropout, noise_shape=(N,))(concatenate([shared_embedding(input_qi_si),
  153. input_f31]))
  154. emb_ai_p = Dropout(dropout, noise_shape=(N,))(concatenate([shared_embedding(input_ai_p),
  155. input_f13]))
  156. emb_qi_si_n = Dropout(dropout, noise_shape=(N,))(concatenate([shared_embedding(input_qi_si),
  157. input_f32]))
  158. emb_ai_n = Dropout(dropout, noise_shape=(N,))(concatenate([shared_embedding(input_ai_n),
  159. input_f23]))
  160. emb_outputs = [emb_qi_p, emb_si_p, emb_qi_n, emb_si_n, emb_qi_si_p, emb_ai_p, emb_qi_si_n, emb_ai_n]
  161. return N, input_nodes, emb_outputs

Scoring Function

To handle the long sentences, the word level attention-based model is used as the scoring functions G and H.

The model builds the embeddings of two sequences of tokens X, Y. The model encodes each token of X, Y using a bidirectional LSTM and calculates the sentence vector X by applying a convolution on the output token vectors of the bidirectional LSTM on the X side. Then the each token vector of Y are multiplied by a softmax weight, which is determined by X.

fifth eq

sixth eq

seventh eq

where

  • eigth eq is the t-th token vector on the Y side.
  • nineth eq is the
    updated t-th token vector.
  • tenth eq are attention
    parameters
  1. def attention_model(input_nodes, N, pfx=''):
  2. # apply biLSTM on each sentence X,Y
  3. qpos, pos, qneg, neg = rnn_input(N, pfx=pfx, dropout=conf['dropout'], dropoutfix_inp=conf['dropoutfix_inp'],
  4. dropoutfix_rec=conf['dropoutfix_rec'], sdim=conf['sdim'],
  5. rnnbidi_mode=conf['rnnbidi_mode'], rnn=conf['rnn'], rnnact=conf['rnnact'],
  6. rnninit=conf['rnninit'], inputs=input_nodes)
  7. # calculate the sentence vector on X side using Convolutional Neural Networks
  8. qpos_aggreg, qneg_aggreg, gwidth = aggregate(qpos, qneg, 'aggre_q'+pfx, N,
  9. dropout=conf['dropout'], l2reg=conf['l2reg'],
  10. sdim=conf['sdim'], cnnact=conf['cnnact'], cdim=conf['cdim'],
  11. cfiltlen=conf['cfiltlen'])
  12. # re-embed X,Y in attention space
  13. awidth = int(N*conf['adim'])
  14. shared_dense_q = Dense(awidth, name='attn_proj_q'+pfx, kernel_regularizer=l2(conf['l2reg']))
  15. qpos_aggreg_attn = shared_dense_q(qpos_aggreg)
  16. qneg_aggreg_attn = shared_dense_q(qneg_aggreg)
  17. shared_dense_s = Dense(awidth, name='attn_proj_s'+pfx, kernel_regularizer=l2(conf['l2reg']))
  18. pos_attn = TimeDistributed(shared_dense_s)(pos)
  19. neg_attn = TimeDistributed(shared_dense_s)(neg)
  20. # apply an attention function on Y side by producing an vector of scalars denoting the attention for each token
  21. pos_foc, neg_foc = focus(N, qpos_aggreg_attn, qneg_aggreg_attn, pos_attn, neg_attn,
  22. pos, neg, conf['sdim'], awidth,
  23. conf['l2reg'], pfx=pfx)
  24. # calculate the sentence vector on Y side using Convolutional Neural Networks
  25. pos_aggreg, neg_aggreg, gwidth = aggregate(pos_foc, neg_foc, 'aggre_s'+pfx, N,
  26. dropout=conf['dropout'], l2reg=conf['l2reg'], sdim=conf['sdim'],
  27. cnnact=conf['cnnact'], cdim=conf['cdim'], cfiltlen=conf['cfiltlen'])
  28. return ([qpos_aggreg, pos_aggreg], [qneg_aggreg, neg_aggreg])
  29. def rnn_input(N, dropout=3/4, dropoutfix_inp=0, dropoutfix_rec=0,
  30. sdim=2, rnn=GRU, rnnact='tanh', rnninit='glorot_uniform', rnnbidi_mode=add,
  31. inputs=None, pfx=''):
  32. if rnnbidi_mode == 'concat':
  33. sdim /= 2
  34. shared_rnn_f = rnn(int(N*sdim), kernel_initializer=rnninit, input_shape=(None, conf['pad'], N),
  35. activation=rnnact, return_sequences=True, dropout=dropoutfix_inp,
  36. recurrent_dropout=dropoutfix_rec, name='rnnf'+pfx)
  37. shared_rnn_b = rnn(int(N*sdim), kernel_initializer=rnninit, input_shape=(None, conf['pad'], N),
  38. activation=rnnact, return_sequences=True, dropout=dropoutfix_inp,
  39. recurrent_dropout=dropoutfix_rec, go_backwards=True, name='rnnb'+pfx)
  40. qpos_f = shared_rnn_f(inputs[0])
  41. pos_f = shared_rnn_f(inputs[1])
  42. qneg_f = shared_rnn_f(inputs[2])
  43. neg_f = shared_rnn_f(inputs[3])
  44. qpos_b = shared_rnn_b(inputs[0])
  45. pos_b = shared_rnn_b(inputs[1])
  46. qneg_b = shared_rnn_b(inputs[2])
  47. neg_b = shared_rnn_b(inputs[3])
  48. qpos = Dropout(dropout, noise_shape=(conf['pad'], int(N*sdim)))(rnnbidi_mode([qpos_f, qpos_b]))
  49. pos = Dropout(dropout, noise_shape=(conf['pad'], int(N*sdim)))(rnnbidi_mode([pos_f, pos_b]))
  50. qneg = Dropout(dropout, noise_shape=(conf['pad'], int(N*sdim)))(rnnbidi_mode([qneg_f, qneg_b]))
  51. neg = Dropout(dropout, noise_shape=(conf['pad'], int(N*sdim)))(rnnbidi_mode([neg_f, neg_b]))
  52. return (qpos, pos, qneg, neg)
  53. def aggregate(in1, in2, pfx, N, dropout, l2reg, sdim, cnnact, cdim, cfiltlen):
  54. '''
  55. In the paper, the sentence vector was calculated using simple averagring,
  56. but we will use Convolutional Neural Networks in the demo.
  57. '''
  58. shared_conv = Convolution1D(name=pfx+'c', input_shape=(conf['pad'], int(N*sdim)), kernel_size=cfiltlen,
  59. filters=int(N*cdim), activation=cnnact, kernel_regularizer=l2(l2reg))
  60. aggreg1 = shared_conv(in1)
  61. aggreg2 = shared_conv(in2)
  62. nsteps = conf['pad'] - cfiltlen + 1
  63. width = int(N*cdim)
  64. aggreg1, aggreg2 = pool(pfx, aggreg1, aggreg2, nsteps, width, dropout=dropout)
  65. return (aggreg1, aggreg2, width)
  66. def pool(pfx, in1, in2, nsteps, width, dropout):
  67. pooling = MaxPooling1D(pool_size=nsteps, name=pfx+'pool[0]')
  68. out1 = pooling(in1)
  69. out2 = pooling(in2)
  70. flatten = Flatten(name=pfx+'pool[1]')
  71. out1 = Dropout(dropout, noise_shape=(1, width))(flatten(out1))
  72. out2 = Dropout(dropout, noise_shape=(1, width))(flatten(out2))
  73. return (out1, out2)
  74. def focus(N, input_aggreg1, input_aggreg2, input_seq1, input_seq2, orig_seq1, orig_seq2,
  75. sdim, awidth, l2reg, pfx=''):
  76. repeat_vec = RepeatVector(conf['pad'], name='input_aggreg1_rep'+pfx)
  77. input_aggreg1_rep = repeat_vec(input_aggreg1)
  78. input_aggreg2_rep = repeat_vec(input_aggreg2)
  79. attn1 = Activation('tanh')(add([input_aggreg1_rep, input_seq1]))
  80. attn2 = Activation('tanh')(add([input_aggreg2_rep, input_seq2]))
  81. shared_dense = Dense(1, kernel_regularizer=l2(l2reg), name='focus1'+pfx)
  82. attn1 = TimeDistributed(shared_dense)(attn1)
  83. attn2 = TimeDistributed(shared_dense)(attn2)
  84. flatten = Flatten(name='attn_flatten'+pfx)
  85. attn1 = flatten(attn1)
  86. attn2 = flatten(attn2)
  87. attn1 = Activation('softmax')(attn1)
  88. attn1 = RepeatVector(int(N*sdim))(attn1)
  89. attn1 = Permute((2,1))(attn1)
  90. output1 = multiply([orig_seq1, attn1])
  91. attn2 = Activation('softmax')(attn2)
  92. attn2 = RepeatVector(int(N*sdim))(attn2)
  93. attn2 = Permute((2,1))(attn2)
  94. output2 = multiply([orig_seq2, attn2])
  95. return (output1, output2)

To compare two sentence vectors, we used cosines similarity measure in the paper, but in the demo we use the mlp similarity function.

  1. def mlp_ptscorer(inputs1, inputs2, Ddim, N, l2reg, pfx='out', oact='sigmoid', extra_inp=[]):
  2. """ Element-wise features from the pair fed to an MLP. """
  3. sum1 = add(inputs1)
  4. sum2 = add(inputs2)
  5. mul1 = multiply(inputs1)
  6. mul2 = multiply(inputs2)
  7. mlp_input1 = concatenate([sum1, mul1])
  8. mlp_input2 = concatenate([sum2, mul2])
  9. # Ddim may be either 0 (no hidden layer), scalar (single hidden layer) or
  10. # list (multiple hidden layers)
  11. if Ddim == 0:
  12. Ddim = []
  13. elif not isinstance(Ddim, list):
  14. Ddim = [Ddim]
  15. if Ddim:
  16. for i, D in enumerate(Ddim):
  17. shared_dense = Dense(int(N*D), kernel_regularizer=l2(l2reg),
  18. activation='tanh', name=pfx+'hdn[%d]'%(i,))
  19. mlp_input1 = shared_dense(mlp_input1)
  20. mlp_input2 = shared_dense(mlp_input2)
  21. shared_dense = Dense(1, kernel_regularizer=l2(l2reg), activation=oact, name=pfx+'mlp')
  22. mlp_out1 = shared_dense(mlp_input1)
  23. mlp_out2 = shared_dense(mlp_input2)
  24. return [mlp_out1, mlp_out2]

Model Architecture

  1. def build_model():
  2. # input embedding
  3. N, input_nodes_emb, output_nodes_emb = embedding()
  4. # story selection
  5. ptscorer_inputs1, ptscorer_inputs2 = attention_model(output_nodes_emb[:4], N, pfx='S')
  6. scoreS1, scoreS2 = mlp_ptscorer(ptscorer_inputs1, ptscorer_inputs2, conf['Ddim'], N,
  7. conf['l2reg'], pfx='outS', oact='sigmoid')
  8. # anwer selection
  9. ptscorer_inputs3, ptscorer_inputs4 = attention_model(output_nodes_emb[4:], N, pfx='A')
  10. scoreA1, scoreA2 = mlp_ptscorer(ptscorer_inputs3, ptscorer_inputs4, conf['Ddim'], N,
  11. conf['l2reg'], pfx='outA', oact='sigmoid')
  12. output_nodes = [scoreS1, scoreS2, scoreA1, scoreA2]
  13. model = Model(inputs=input_nodes_emb, outputs=output_nodes)
  14. model.compile(loss=ranking_loss, optimizer=conf['opt'])
  15. return model

Loss Function

Training is performed with a hinge rank loss over these two triplets:

eleventh eq

where

  • twelveth eq is the correct relevant story for q, i.e. thirteenth eq
  • fourteenth eq is the correct answer sentence for q.
  • fifteenth eq and sixteenth eq are margins
  1. '''
  2. posS: G(q, s^*)
  3. negS: G(q, s_i)
  4. posA: H(s_a, s^*)
  5. negA: H(s_a, a_r)
  6. '''
  7. def ranking_loss(y_true, y_pred):
  8. posS = y_pred[0]
  9. negS = y_pred[1]
  10. posA = y_pred[2]
  11. negA = y_pred[3]
  12. margin = conf['margin']
  13. loss = K.maximum(margin + negS - posS, 0.0) + K.maximum(margin + negA - posA, 0.0)
  14. return K.mean(loss, axis=-1)

Train and Evaluation

  1. def train_and_eval(runid):
  2. print('Model')
  3. model = build_model()
  4. print(model.summary())
  5. print('Training')
  6. fit_model(model, weightsf='weights-'+runid+'-bestval.h5')
  7. model.save_weights('weights-'+runid+'-final.h5', overwrite=True)
  8. model.load_weights('weights-'+runid+'-bestval.h5')
  9. print('Predict&Eval (best val epoch)')
  10. res = eval(model)
  11. def fit_model(model, **kwargs):
  12. epochs = conf['epochs']
  13. callbacks = fit_callbacks(kwargs.pop('weightsf'))
  14. # During the computation, these values will not be used at all.
  15. # Note that the variable 'y_true' in function ranking_loss does not participate in calculations.
  16. dummy1 = np.ones((len(inp_tr['qi']),1), dtype=np.float)
  17. dummy2 = np.ones((len(inp_val['qi']),1), dtype=np.float)
  18. return model.fit(inp_tr, y=[dummy1,dummy1,dummy1,dummy1], validation_data=[inp_val,
  19. [dummy2,dummy2,dummy2,dummy2]], callbacks = callbacks, epochs=epochs)

At every epoch, the callback function measures mrr performance and accuracy

``` python
def fit_callbacks(weightsf):
return [utils.AnsSelCB(inp_val[‘q’], inp_val[‘s_p’], inp_val[‘s_n’], inp_val[‘q_sp’],
inp_val[‘a_p’], inp_val[‘a_n’], y_val, inp_val),
ModelCheckpoint(weightsf, save_best_only=True, monitor=’acc’, mode=’max’),
EarlyStopping(monitor=’acc’, mode=’max’, patience=12)]

def eval(model):
res = []
for inp in [inp_val, inp_test]:
if inp is None:
res.append(None)
continue

  1. pred = model.predict(inp)
  2. ypredS = pred[0]
  3. ypredA1 = pred[2]
  4. ypredA2 = pred[3]
  5. res.append(utils.eval_QA(ypredS, ypredA1, ypredA2, inp['q'], inp['y'], MAP=False))
  6. return tuple(res)

if name == “main“:
trainf = ‘data/anssel/pororo/train_triplet_concat_a5_500.csv’
valf = ‘data/anssel/pororo/dev_triplet_concat_a5_for_mrr_500.csv’
testf = ‘data/anssel/pororo/dev_triplet_concat_a5_for_mrr_500.csv’
params = []

  1. conf, ps, h = config()
  2. if conf['emb'] == 'Glove':
  3. print('GloVe')
  4. emb = utils.GloVe(N=conf['embdim'])
  5. print('Dataset')
  6. load_data(trainf,valf,testf)
  7. runid = 'DEMN-%x' % (h)
  8. print('RunID: %s (%s)' % (runid, ps))
  9. train_and_eval(runid)