项目作者: utahnlp

项目描述 :
Implementation of models in our EMNLP 2019 paper: A Logic-Driven Framework for Consistency of Neural Models
高级语言: Python
项目地址: git://github.com/utahnlp/consistency.git
创建时间: 2019-08-14T02:59:18Z
项目社区:https://github.com/utahnlp/consistency

开源协议:Apache License 2.0

下载


Implementation of the NLI model in our EMNLP 2019 paper: A Logic-Driven Framework for Consistency of Neural Models

  1. @inproceedings{li2019consistency,
  2. author = {Li, Tao and Gupta, Vivek and Mehta, Maitrey and Srikumar, Vivek},
  3. title = {A Logic-Driven Framework for Consistency of Neural Models},
  4. booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
  5. year = {2019}
  6. }

Headsup

To include recent fix(es) in this repo and updates in pytorch/huggingface/apex, try the branch post-camera-ready.\
For exact reproducibility, stick to this branch.


0. Prerequisites

[Hardware]
All of our BERT models are based on BERT base version. The batch size, sequence length, and data format are configurated to run smoothly on CUDA device with 8GB memory.

Have the following installed:

  1. python 3.6+
  2. NVCC compiler 10.0
  3. pytorch 1.0
  4. h5py
  5. numpy
  6. spacy 2.0.11 (with en model)
  7. nvidia apex
  8. pytorch BERT by huggingface(https://github.com/huggingface/pytorch-pretrained-BERT)
  9. (download and put in ../pytorch-pretrained-BERT, not necessarily installed)
  10. (However, for exact reproducibility, use the pytorch-pretrained-BERT.zip in this repo)
  11. glove.840B.300d.txt (under ./data/)
  12. (We don't actually use it, but need it for preprocessing (due to an old design).)

[SNLI]
Besides above, make sure snli_1.0 data is unpacked to ./data/bert_nli/, e.g. ./data/bert_nli/snli_1.0_train.txt.

[MNLI]
And have mnli_1.0 data unpacked to ./data/bert_nli/. We will use the mnli_dev_matched for validation, and the mnli_dev_mismatched for testing. For example, the validation file should be at ./data/bert_nli/multinli_1.0_dev_matched.txt

[MSCOCO]
Unpack mscoco sample data via unzip ./data/bert_nli/mscoco.zip. The zip file contains training split (e.g. mscoco.raw.sent1.txt) with 400k sentence triples and test split (e.g. mscoco.test.raw.sent1.txt) with 100k sentence triples. In practice, our paper sampled 100k (i.e. 25%) from the training split, and used all examples in the test split.

1. Preprocessing

[SNLI]
Preprocessing of SNLI is separated into the following steps.

  1. python3 snli_extract.py --data ./data/bert_nli/snli_1.0_train.txt --output ./data/bert_nli/train
  2. python3 snli_extract.py --data ./data/bert_nli/snli_1.0_dev.txt --output ./data/bert_nli/val
  3. python3 snli_extract.py --data ./data/bert_nli/snli_1.0_test.txt --output ./data/bert_nli/test
  4. python3 preprocess.py --glove ./data/glove.840B.300d.txt --batch_size 48 --dir ./data/bert_nli/ --output snli --tokenizer_output snli
  5. python3 get_char_idx.py --dict ./data/bert_nli/snli.allword.dict --token_l 16 --freq 5 --output char

NOTE, For exact reproducibility, we will use the dev_excl_anno.raw.sent*.txt for actual SNLI validation. These files are already included in the ./data/bert_nli/ directory and will be implicitly used in the above scripts. The difference is that we reserved 1000 examples for preliminary manual analysis and then later excluded them from experiments to avoid contamination.

[MNLI]
Preprocessing of MNLI dataset:

  1. python3 mnli_extract.py --data ./data/bert_nli/multinli_1.0_dev_mismatched.txt --output ./data/bert_nli/mnli.test
  2. python3 mnli_extract.py --data ./data/bert_nli/multinli_1.0_train.txt --output ./data/bert_nli/mnli.train
  3. python3 mnli_extract.py --data ./data/bert_nli/multinli_1.0_dev_matched.txt --output ./data/bert_nli/mnli.dev
  4. python3 preprocess.py --glove ./data/glove.840B.300d.txt --batch_size 36 --dir ./data/bert_nli/ \
  5. --sent1 mnli.train.raw.sent1.txt --sent2 mnli.train.raw.sent2.txt --label mnli.train.label.txt \
  6. --sent1_val mnli.dev.raw.sent1.txt --sent2_val mnli.dev.raw.sent2.txt --label_val mnli.dev.label.txt \
  7. --sent1_test mnli.test.raw.sent1.txt --sent2_test mnli.test.raw.sent2.txt --label_test mnli.test.label.txt \
  8. --tokenizer_output mnli --output mnli --max_seq_l 500

[MSCOCO]
Preprocessing of mscoco dataset:

  1. python3 extra_preprocess.py --glove ./data/glove.840B.300d.txt --batch_size 48 --dir ./data/bert_nli/ --sent1 mscoco.raw.sent1.txt --sent2 mscoco.raw.sent2.txt --sent3 mscoco.raw.sent3.txt --tokenizer_output mscoco --output mscoco
  2. python3 extra_preprocess.py --glove ./data/glove.840B.300d.txt --batch_size 48 --dir ./data/bert_nli/ --sent1 mscoco.test.raw.sent1.txt --sent2 mscoco.test.raw.sent2.txt --sent3 mscoco.test.raw.sent3.txt --tokenizer_output mscoco.test --output mscoco.test

2. BERT Baseline

[Finetuning once] on both SNLI and MNLI

  1. mkdir models
  2. GPUID=[GPUID]
  3. LR=0.00003
  4. PERC=1
  5. for SEED in `seq 1 3`; do
  6. CUDA_VISIBLE_DEVICES=$GPUID python3 -u train.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ \
  7. --train_data mnli.train.hdf5 --val_data mnli.val.hdf5 --extra_train_data snli.train.hdf5 --extra_val_data snli.val.hdf5 \
  8. --learning_rate $LR --epochs 3 --warmup_epoch 3 \
  9. --enc bert --cls linear --hidden_size 768 --percent $PERC --dropout 0.0 \
  10. --fix_bert 0 --optim adam_fp16 --fp16 1 --seed ${SEED} \
  11. --save_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} | tee models/scratch_mnli_snli_perc${PERC//.}_seed${SEED}.txt
  12. done

Change [GPUID] to the desired device id, PERC specifies percentages of training data to use (1 is 100%). The above script will initiate BERT baselines with three different random seeds (i.e. three runs in a row). Expect to see exactly the same accuracy as we reported in our paper.

We also disabled the dropout in the final linear layer. However, there will be a dropout 0.1 (by default) inside of Bert during training.

[Finetuning twice] on both SNLI and MNLI

  1. GPUID=[GPUID]
  2. LR=0.00001
  3. PERC=1
  4. for SEED in `seq 1 3`; do
  5. CUDA_VISIBLE_DEVICES=$GPUID python3 -u train.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ \
  6. --train_data mnli.train.hdf5 --val_data mnli.val.hdf5 --extra_train_data snli.train.hdf5 --extra_val_data snli.val.hdf5 \
  7. --learning_rate $LR --epochs 3 --warmup_epoch 3 \
  8. --enc bert --cls linear --hidden_size 768 --percent $PERC --dropout 0.0 \
  9. --fix_bert 0 --optim adam_fp16 --fp16 1 --seed ${SEED} \
  10. --load_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} \
  11. --save_file models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED} | tee models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED}.txt
  12. done

This will load the previously finetuned model and continue finetune with lowered learning rate. Expect to see exactly the same accuracy as we reported in our paper.

[Evaluation] on SNLI test set

  1. GPUID=[GPUID]
  2. PERC=1
  3. SEED=[SEED]
  4. CUDA_VISIBLE_DEVICES=$GPUID python3 -u eval.py --gpuid 0 --bert_gpuid 0 --dir data/bert_nli/ --data snli.test.hdf5 \
  5. --enc bert --cls linear --hidden_size 768 --fp16 1 --dropout 0.0 \
  6. --load_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} | tee models/scratch_mnli_snli_perc${PERC//.}_seed${SEED}.evallog.txt

For MNLI, use --data mnli.test.hdf5.

[Evaluation] on mirror consistency

  1. GPUID=[GPUID]
  2. PERC=1
  3. for SWAP_SENT in 0 1; do
  4. for SEED in `seq 1 3`; do
  5. CUDA_VISIBLE_DEVICES=$GPUID python3 -u eval.py --gpuid 0 --bert_gpuid 0 --dir data/bert_nli/ --data mscoco.test.hdf5 \
  6. --enc bert --cls linear --hidden_size 768 --fp16 1 --dropout 0.0 --swap_sent $SWAP_SENT \
  7. --pred_output models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED}_swap${SWAP_SENT} \
  8. --load_file models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED} | tee models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED}.evallog.txt
  9. done
  10. done

[Evaluation] on transitivity consistency

  1. GPUID=[GPUID]
  2. PERC=1
  3. for PAIR in alpha beta gamma; do
  4. for SEED in `seq 1 3`; do
  5. CUDA_VISIBLE_DEVICES=$GPUID python3 -u eval.py --gpuid 0 --bert_gpuid 0 --dir data/bert_nli/ --data mscoco.test.hdf5 \
  6. --enc bert --cls linear --hidden_size 768 --fp16 1 --dropout 0.0 --data_triple_mode 1 --sent_pair $PAIR --swap_sent 0 \
  7. --pred_output models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED}_${PAIR} \
  8. --load_file models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED} | tee models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED}.evallog.txt
  9. done
  10. done

3. BERT+M

  1. GPUID=[GPUID]
  2. LR=0.00001
  3. CONSTR=6
  4. PERC=1
  5. LAMBD=1
  6. for SEED in `seq 1 3`; do
  7. CUDA_VISIBLE_DEVICES=$GPUID python3 -u train.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ \
  8. --train_data mnli.train.hdf5 --val_data mnli.val.hdf5 --extra_train_data snli.train.hdf5 --extra_val_data snli.val.hdf5 \
  9. --learning_rate $LR --epochs 3 --warmup_epoch 3 \
  10. --loss transition --fwd_mode flip --lambd ${LAMBD} \
  11. --enc bert --cls linear --hidden_size 768 --percent $PERC --dropout 0.0 --constr ${CONSTR} \
  12. --fix_bert 0 --optim adam_fp16 --fp16 1 --seed ${SEED} \
  13. --load_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} \
  14. --save_file models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED} | tee models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED}.txt
  15. done

Do change PERC and LAMBD accordingly.

[Evaluation] on mirror consistency

  1. GPUID=[GPUID]
  2. LR=0.00001
  3. CONSTR=6
  4. PERC=0.2
  5. LAMBD=1
  6. for SWAP_SENT in 0 1; do
  7. for SEED in `seq 1 3`; do
  8. CUDA_VISIBLE_DEVICES=$GPUID python3 -u eval.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ --data mscoco.test.hdf5 \
  9. --enc bert --cls linear --dropout 0.0 --hidden_size 768 --fp16 1 --data_triple_mode 0 --swap_sent $SWAP_SENT \
  10. --pred_output models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED}_swap${SWAP_SENT} \
  11. --load_file models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED} | tee models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED}.triplelog.txt
  12. done
  13. done
  14. python3 confusion_table.py --log both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}

[Evaluation] on transitivity consistency

  1. GPUID=[GPUID]
  2. LR=0.00001
  3. CONSTR=6
  4. PERC=0.2
  5. LAMBD=1
  6. for PAIR in alpha beta gamma; do
  7. for SEED in `seq 1 3`; do
  8. CUDA_VISIBLE_DEVICES=$GPUID python3 -u eval.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ --data mscoco.test.hdf5 \
  9. --enc bert --cls linear --dropout 0.0 --hidden_size 768 --fp16 1 --data_triple_mode 1 --sent_pair $PAIR \
  10. --pred_output models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED}_${PAIR} \
  11. --load_file models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED} | tee models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED}.triplelog.txt
  12. done
  13. done
  14. for SEED in `seq 1 3`; do
  15. python3 triple_confusion.py --log both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.} --seed $SEED
  16. done

4. BERT+M,U

  1. GPUID=[GPUID]
  2. PERC=0.01
  3. PERC_U=0.25
  4. CONSTR=6
  5. LR=0.000005
  6. LAMBD=1
  7. LAMBD_P=0.001
  8. for SEED in `seq 1 3`; do
  9. CUDA_VISIBLE_DEVICES=$GPUID python3 -u train.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ \
  10. --train_data mnli.train.hdf5 --val_data mnli.val.hdf5 --extra_train_data snli.train.hdf5 --extra_val_data snli.val.hdf5 \
  11. --unlabeled_data mscoco.hdf5 --unlabeled_triple_mode 0 \
  12. --loss transition --fwd_mode flip_and_unlabeled --lambd ${LAMBD} \
  13. --learning_rate $LR --epochs 3 --warmup_epoch 3 --dropout 0.0 --constr ${CONSTR} \
  14. --enc bert --cls linear --hidden_size 768 --percent $PERC --unlabeled_perc ${PERC_U} --lambd_p $LAMBD_P \
  15. --fix_bert 0 --optim adam_fp16 --fp16 1 --seed ${SEED} \
  16. --load_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} \
  17. --save_file models/both_mscoco_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_${LAMBD_P//.}_perc${PERC//.}_${PERC_U//.}_seed${SEED} | tee models/both_mscoco_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_${LAMBD_P//.}_perc${PERC//.}_${PERC_U//.}_seed${SEED}.txt
  18. done

Here we set PERC_U=0.25 to sample about 100k unlabeled instance pairs(U) for training.

Do change PERC, LAMBD, and LAMBD_P accordingly. For evaluation, construct evaluation script accordingly as above.

5. BERT+M,U,T

  1. GPUID=[GPUID]
  2. PERC=0.01
  3. PERC_U=0.25
  4. CONSTR=1,2,3,4,6
  5. LR=0.000005
  6. LAMBD=1
  7. LAMBD_P=0.00001
  8. LAMBD_T=0.000001
  9. for SEED in `seq 3 3`; do
  10. CUDA_VISIBLE_DEVICES=$GPUID python3 -u train.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ \
  11. --train_data mnli.train.hdf5 --val_data mnli.val.hdf5 --extra_train_data snli.train.hdf5 --extra_val_data snli.val.hdf5 \
  12. --unlabeled_data mscoco.hdf5 --unlabeled_triple_mode 1 \
  13. --loss transition --fwd_mode flip_and_triple --fix_bert 0 --optim adam_fp16 --fp16 1 --weight_decay 1 \
  14. --learning_rate $LR --epochs 3 --warmup_epoch 3 --dropout 0.0 --constr ${CONSTR} \
  15. --enc bert --cls linear --hidden_size 768 --percent $PERC --unlabeled_perc ${PERC_U} --lambd ${LAMBD} --lambd_p $LAMBD_P --lambd_t $LAMBD_T \
  16. --seed ${SEED} \
  17. --load_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} \
  18. --save_file models/both_mscoco_flip_triple${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_${LAMBD_P//.}_${LAMBD_T//.}_perc${PERC//.}_${PERC_U//.}_seed${SEED} | tee models/both_mscoco_flip_triple${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_${LAMBD_P//.}_${LAMBD_T//.}_perc${PERC//.}_${PERC_U//.}_seed${SEED}.txt
  19. done
  20. Here we set ```PERC_U=0.25``` to sample about ```100k``` unlabeled instance triples(T) for training.

Do change PERC, LAMBD, and LAMBD_P accordingly. For evaluation, construct evaluation script accordingly as above.

Hyperparameters

Please refer to the appendices of our paper for details of hyperparameters. The --learning_rate, --lambd, --lambd_p, and --lambd_t change over different percentages --percent and --unlabeled_perc.

Issues & To-dos

  • Sanity check