项目作者: bloomberg

项目描述 :
Structured Gradient Tree Boosting
高级语言: Python
项目地址: git://github.com/bloomberg/sgtb.git
创建时间: 2018-09-14T18:36:31Z
项目社区:https://github.com/bloomberg/sgtb

开源协议:Apache License 2.0

下载


Structured Gradient Tree Boosting

Author: Yi Yang

Contact: yyang464@bloomberg.net

Basic description

This is the Python implementation of the structured gradient tree boosting model
for collective named entity disambiguation, described in

  1. Yi Yang, Ozan Irsoy, and Kazi Shefaet Rahman
  2. "Collective Entity Disambiguation with Structured Gradient Tree Boosting"
  3. NAACL 2018

[pdf]

BibTeX

  1. @inproceedings{yang2018collective,
  2. title={Collective Entity Disambiguation with Structured Gradient Tree Boosting},
  3. author={Yang, Yi and Irsoy, Ozan and Rahman, Kazi Shefaet},
  4. booktitle={Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
  5. volume={1},
  6. pages={777--786},
  7. year={2018}
  8. }

Data

The preprocessed AIDA-CoNLL
data (‘AIDA-PPR-processed.json’) is available in the data folder:

  • The entity candidates are generated based on the PPRforNED candidate
    generation system.
  • The system uses 19 local features, including 3 prior features, 4 NER features,
    2 entity popularity features, 4 entity type features, and 6 context features.
    Please look into the paper for details.

The system also uses entity-entity features, which can be quickly computed
on-the-fly. Here, we provide pre-computed entity-entity features (3 features
per entity pair) for the AIDA-CoNLL dataset, which is available in the
data folder (‘ent_ent_feats.txt.gz’).

Reproduce results

You can reproduce the SGTB-BSG results by running:

  1. python structured_learner.py --num-thread=16 --num-epoch=250

I got 95.32 accuracy on the test dataset. Training took 35 min on 16 threads.