项目作者: Janus-Shiau

项目描述 :
Lookahead optimizer ("Lookahead Optimizer: k steps forward, 1 step back") for tensorflow
高级语言: Python
项目地址: git://github.com/Janus-Shiau/lookahead_tensorflow.git
创建时间: 2019-07-31T04:58:47Z
项目社区:https://github.com/Janus-Shiau/lookahead_tensorflow

开源协议:

下载


lookahead_tensorflow

Lookahead optimizer (“Lookahead Optimizer: k steps forward, 1 step back”) for tensorflow

Environment

This code is implemmented and tested with tensorflow 1.11.0. and 1.13.0. \
I didn’t use any special operator, so it should also work for other version of tensorflow.

Usage

I didn’t directly wrap the optimizer, but make the lookahead strategy independent. \
Thus, it’s more flexible to decide what should be optimized with lookahead.

  1. Please assert the class after all variable initialization, and initialize the BaseLoookAhead with all trainable variables.
    ```
    import tensorflow as tf
    from lookahead_opt import BaseLookAhead

“””
Build your model here
Please also include any optimizer you need.
“””

model_vars = [v for v in tf.trainable_variables()]
tf.global_variables_initializer().run()

lookahead = BaseLookAhead(model_vars, k=5, alpha=0.5)

  1. Arguments are define as follows:
  2. > `model_vars`: the variables to be lookahead. [list]\
  3. > `k`: the number of steps that fast weights go forward. [int]\
  4. > `alpha`: The learning rate for merging slow to fast weight. [float]
  5. 2. Add the assign operator to training operation or directly run in session.

Add to train_op

train_op += lookahead.get_ops()

Or just run the Session

with tf.Session() as sess:
_ = sess.run(lookahead.get_ops())
```

Implementation Details

Inject Lookahead to model and save specific variables

The Lookahead is wrapped with default variable_scope “lookahead”.
After calling BaseLookAhead with specific variables, the variables will be injected to lookahead.\
Noted that, the lookahead class is totally separated from optimizer, please remember to add optimizer when creating training graph.

Example template graph with lookahead

The BaseLookAhead will create duplicated tf.Variables to save the slow weight.
And a counter will be automatically created to do “k steps forward, 1 step back”.

Example template graph with lookahead

Experimental results

I have conduct experiments on a many-to-many recursive task with stacked weight-dropped LSTM, proposed in “Regularizing and Optimizing LSTM Language Models”. \
Using lookahead with Adam, the training loss is higher than the model without lookahead. But the validation loss with lookahead is slightly better.

Contact & Copy Right

Code work by Jia-Yau Shiau jiayau.shiau@gmail.com.