Structured Gradient Tree Boosting
Author: Yi Yang
Contact: yyang464@bloomberg.net
This is the Python implementation of the structured gradient tree boosting model
for collective named entity disambiguation, described in
Yi Yang, Ozan Irsoy, and Kazi Shefaet Rahman
"Collective Entity Disambiguation with Structured Gradient Tree Boosting"
NAACL 2018
BibTeX
@inproceedings{yang2018collective,
title={Collective Entity Disambiguation with Structured Gradient Tree Boosting},
author={Yang, Yi and Irsoy, Ozan and Rahman, Kazi Shefaet},
booktitle={Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
volume={1},
pages={777--786},
year={2018}
}
The preprocessed AIDA-CoNLL
data (‘AIDA-PPR-processed.json’) is available in the data folder:
The system also uses entity-entity features, which can be quickly computed
on-the-fly. Here, we provide pre-computed entity-entity features (3 features
per entity pair) for the AIDA-CoNLL dataset, which is available in the
data folder (‘ent_ent_feats.txt.gz’).
You can reproduce the SGTB-BSG results by running:
python structured_learner.py --num-thread=16 --num-epoch=250
I got 95.32 accuracy on the test dataset. Training took 35 min on 16 threads.