项目作者: kamalkraj

项目描述 :
ALBERT model Pretraining and Fine Tuning using TF2.0
高级语言: Python
项目地址: git://github.com/kamalkraj/ALBERT-TF2.0.git
创建时间: 2019-10-31T19:33:44Z
项目社区:https://github.com/kamalkraj/ALBERT-TF2.0

开源协议:Apache License 2.0

下载


ALBERT-TF2.0

ALBERT model Fine Tuning using TF2.0

This repository contains TensorFlow 2.0 implementation for ALBERT.

Requirements

  • python3
  • pip install -r requirements.txt

ALBERT Pre-training

ALBERT model pre-training from scratch and Domain specific fine-tuning. Instructions here

Download ALBERT TF 2.0 weights

Verison 1 Version 2
base base
large large
xlarge xlarge
xxlarge xxlarge

unzip the model inside repo.

Above weights does not contain the final layer in original model. Now can only be used for fine tuning downstream tasks.

For full Weights conversion from TF-HUB to TF 2.0 here

Download glue data

Download using the below cmd

  1. python download_glue_data.py --data_dir glue_data --tasks all

Fine-tuning

To prepare the fine-tuning data for final model training, use the
create_finetuning_data.py script. Resulting
datasets in tf_record format and training meta data should be later passed to
training or evaluation scripts. The task-specific arguments are described in
following sections:

Creating finetuninig data

  • Example CoLA
  1. export GLUE_DIR=glue_data/
  2. export ALBERT_DIR=large/
  3. export TASK_NAME=CoLA
  4. export OUTPUT_DIR=cola_processed
  5. mkdir $OUTPUT_DIR
  6. python create_finetuning_data.py \
  7. --input_data_dir=${GLUE_DIR}/ \
  8. --spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
  9. --train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
  10. --eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
  11. --meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
  12. --fine_tuning_task_type=classification --max_seq_length=128 \
  13. --classification_task_name=${TASK_NAME}

Running classifier

  1. export MODEL_DIR=CoLA_OUT
  2. python run_classifer.py \
  3. --train_data_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
  4. --eval_data_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
  5. --input_meta_data_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
  6. --albert_config_file=${ALBERT_DIR}/config.json \
  7. --task_name=${TASK_NAME} \
  8. --spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
  9. --output_dir=${MODEL_DIR} \
  10. --init_checkpoint=${ALBERT_DIR}/tf2_model.h5 \
  11. --do_train \
  12. --do_eval \
  13. --train_batch_size=16 \
  14. --learning_rate=1e-5 \
  15. --custom_training_loop

By default run_classifier will run 3 epochs. and evaluate on development set

Above cmd would result in dev set accuracy of 76.22 in CoLA task

The above code tested on TITAN RTX 24GB single GPU

SQuAD

Data and Evalution scripts

Training Data Preparation

  1. export SQUAD_DIR=SQuAD
  2. export SQUAD_VERSION=v1.1
  3. export ALBERT_DIR=large
  4. export OUTPUT_DIR=squad_out_${SQUAD_VERSION}
  5. mkdir $OUTPUT_DIR
  6. python create_finetuning_data.py \
  7. --squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
  8. --spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
  9. --train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
  10. --meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
  11. --fine_tuning_task_type=squad \
  12. --max_seq_length=384

Running Model

  1. python run_squad.py \
  2. --mode=train_and_predict \
  3. --input_meta_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
  4. --train_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
  5. --predict_file=${SQUAD_DIR}/dev-${SQUAD_VERSION}.json \
  6. --albert_config_file=${ALBERT_DIR}/config.json \
  7. --init_checkpoint=${ALBERT_DIR}/tf2_model.h5 \
  8. --spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
  9. --train_batch_size=48 \
  10. --predict_batch_size=48 \
  11. --learning_rate=1e-5 \
  12. --num_train_epochs=3 \
  13. --model_dir=${OUTPUT_DIR} \
  14. --strategy_type=mirror

Runnig SQuAD V2.0

  1. export SQUAD_DIR=SQuAD
  2. export SQUAD_VERSION=v2.0
  3. export ALBERT_DIR=xxlarge
  4. export OUTPUT_DIR=squad_out_${SQUAD_VERSION}
  5. mkdir $OUTPUT_DIR
  1. python create_finetuning_data.py \
  2. --squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
  3. --spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
  4. --train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
  5. --meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
  6. --fine_tuning_task_type=squad \
  7. --max_seq_length=384
  1. python run_squad.py \
  2. --mode=train_and_predict \
  3. --input_meta_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
  4. --train_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
  5. --predict_file=${SQUAD_DIR}/dev-${SQUAD_VERSION}.json \
  6. --albert_config_file=${ALBERT_DIR}/config.json \
  7. --init_checkpoint=${ALBERT_DIR}/tf2_model.h5 \
  8. --spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
  9. --train_batch_size=24 \
  10. --predict_batch_size=24 \
  11. --learning_rate=1.5e-5 \
  12. --num_train_epochs=3 \
  13. --model_dir=${OUTPUT_DIR} \
  14. --strategy_type=mirror \
  15. --version_2_with_negative \
  16. --max_seq_length=384

Experiment done on 4 x NVIDIA TITAN RTX 24 GB.

Result

SQuAD output image

Multi-GPU training and XLA

  • Use flag --strategy_type=mirror for Multi GPU training. Currently All the existing GPUs in the environment will be used.
  • Use flag --enable-xla to enable XLA. Model training starting time will be increase.(JIT compilation)

Ignore

Below warning will be displayed if you use keras model.fit method at end of each epoch. Issue with training steps calculation when tf.data provided to model.fit()
Have no effect on model performance so ignore. Mostly will fixed in the next tf2 relase . Issue-link

  1. 2019-10-31 13:35:48.322897: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range:
  2. End of sequence
  3. [[{{node IteratorGetNext}}]]
  4. [[model_1/albert_model/word_embeddings/Shape/_10]]
  5. 2019-10-31 13:36:03.302722: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Out of range:
  6. End of sequence
  7. [[{{node IteratorGetNext}}]]
  8. [[IteratorGetNext/_4]]

References

  1. TensorFlow offical implementation of BERT in TF 2.0 . Lot of parts of code in this repo adapted from the above repo.
  2. LAMB optimizer from TensorFlow addons
  3. TF-HUB weights to TF 2.0 weights conversion : KPE