项目作者: mohammadKhalifa

项目描述 :
Named Entity Recognition with Pretrained XLM-RoBERTa
高级语言: Python
项目地址: git://github.com/mohammadKhalifa/xlm-roberta-ner.git
创建时间: 2020-02-23T08:19:25Z
项目社区:https://github.com/mohammadKhalifa/xlm-roberta-ner

开源协议:MIT License

下载


NER with XLM-RoBERTa

Fine-tuning of the XLM-Roberta cross-lingual architecture for Sequence Tagging, namely Named Entity Recognition.

The code is inspired by BERT-NER repo by kamalkraj.

Requirements

  • python 3.6+
  • torch 1.x
  • fairseq
  • pytorch_transformers (for AdamW and WarmpUpScheduler)

Setting up

  1. export PARAM_SET=base # change to large to use the large architecture
  2. # clone the repo
  3. git clone https://github.com/mohammadKhalifa/xlm-roberta-ner.git
  4. cd xlm-roberta-ner/
  5. mkdir pretrained_models
  6. wget -P pretrained_models https://dl.fbaipublicfiles.com/fairseq/models/xlmr.$PARAM_SET.tar.gz
  7. tar xzvf pretrained_models/xlmr.$PARAM_SET.tar.gz --directory pretrained_models/
  8. rm -r pretrained_models/xlmr.$PARAM_SET.tar.gz

Training and evaluating

The code expects the data directory passed to contain 3 dataset splits: train.txt, valid.txt and test.txt.

Training arguments :

  1. -h, --help show this help message and exit
  2. --data_dir DATA_DIR The input data dir. Should contain the .tsv files (or
  3. other data files) for the task.
  4. --pretrained_path PRETRAINED_PATH
  5. pretrained XLM-Roberta model path
  6. --task_name TASK_NAME
  7. The name of the task to train.
  8. --output_dir OUTPUT_DIR
  9. The output directory where the model predictions and
  10. checkpoints will be written.
  11. --max_seq_length MAX_SEQ_LENGTH
  12. The maximum total input sequence length after
  13. WordPiece tokenization. Sequences longer than this
  14. will be truncated, and sequences shorter than this
  15. will be padded.
  16. --do_train Whether to run training.
  17. --do_eval Whether to run eval or not.
  18. --eval_on EVAL_ON Whether to run eval on the dev set or test set.
  19. --do_lower_case Set this flag if you are using an uncased model.
  20. --train_batch_size TRAIN_BATCH_SIZE
  21. Total batch size for training.
  22. --eval_batch_size EVAL_BATCH_SIZE
  23. Total batch size for eval.
  24. --learning_rate LEARNING_RATE
  25. The initial learning rate for Adam.
  26. --num_train_epochs NUM_TRAIN_EPOCHS
  27. Total number of training epochs to perform.
  28. --warmup_proportion WARMUP_PROPORTION
  29. Proportion of training to perform linear learning rate
  30. warmup for. E.g., 0.1 = 10% of training.
  31. --weight_decay WEIGHT_DECAY
  32. Weight deay if we apply some.
  33. --adam_epsilon ADAM_EPSILON
  34. Epsilon for Adam optimizer.
  35. --max_grad_norm MAX_GRAD_NORM
  36. Max gradient norm.
  37. --no_cuda Whether not to use CUDA when available
  38. --seed SEED random seed for initialization
  39. --gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS
  40. Number of updates steps to accumulate before
  41. performing a backward/update pass.
  42. --fp16 Whether to use 16-bit float precision instead of
  43. 32-bit
  44. --fp16_opt_level FP16_OPT_LEVEL
  45. For fp16: Apex AMP optimization level selected in
  46. ['O0', 'O1', 'O2', and 'O3'].See details at
  47. https://nvidia.github.io/apex/amp.html
  48. --loss_scale LOSS_SCALE
  49. Loss scaling to improve fp16 numeric stability. Only
  50. used when fp16 set to True. 0 (default value): dynamic
  51. loss scaling. Positive power of 2: static loss scaling
  52. value.
  53. --dropout DROPOUT training dropout probability
  54. --freeze_model whether to freeze the XLM-R base model and train only
  55. the classification heads

For example:

  1. python main.py
  2. --data_dir=data/coNLL-2003/ \
  3. --task_name=ner \
  4. --output_dir=model_dir/ \
  5. --max_seq_length=16 \
  6. --num_train_epochs 1 \
  7. --do_eval \
  8. --warmup_proportion=0.1 \
  9. --pretrained_path pretrained_models/xlmr.$PARAM_SET/ \
  10. --learning_rate 0.00007 \
  11. --do_train \
  12. --eval_on test \
  13. --train_batch_size 4
  14. -- dropout 0.2

If you want to use the XLM-R model’s outputs as features without finetuning, Use the --freeze_model argument.

By default, the best model on the validation set is saved to args.output_dir. This model is then loaded and tested on the test set, if --do_eval and --eval_on test.

Results

CoNLL-2003

I tried to reproduce the results in the paper by training the models using the following settings:

  1. --max_seq_length=128
  2. --num_train_epochs 10
  3. --warmup_proportion=0.0
  4. --learning_rate 6e-5
  5. --gradient_accumulation_steps 4
  6. --dropout 0.2
  7. --train_batch_size 32

I got the following F1 scores:

Model Dev F1 Test F1
XLMR-Base 95.29 91.14
XLMR-Large 96.14 91.81

The above results are close to those reported in the paper but a bit worse, probably due to the difference in experimental settings.