项目作者: clips

项目描述 :
Repository for the CLiPS HAte speech DEtection System [HADES].
高级语言: Python
项目地址: git://github.com/clips/hades.git
创建时间: 2016-03-29T08:09:07Z
项目社区:https://github.com/clips/hades

开源协议:GNU Lesser General Public License v3.0

下载


HADES

This is a work-in-progress repository for the CLiPS HAte speech DEtection System (HADES).

Currently, the repository contains the supplementary materials from the paper: “A Dictionary-based Approach to Racism Detection in Dutch Social Media”, presented at the TA-COS workshop at LREC 2016.

license

The dictionaries in this repository are available under a CC BY-SA 4.0 License.
If you use the dictionaries in your work, please cite:

  1. @inproceedings{tulkens2016a,
  2. title={A Dictionary-based Approach to Racism Detection in {Dutch} Social Media},
  3. author={Tulkens, St\'{e}phan and Hilte, Lisa and Lodewyckx, Elise and Verhoeven, Ben and Daelemans, Walter},
  4. booktitle={Proceedings of the LREC 2016 Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS)},
  5. year={2016},
  6. organization={European Language Resources Association (ELRA)}
  7. }

Note that we expanded the TA-COS submission into a journal paper, which was published in the CLIN Journal.

If you use the dictionary expansion techniques from this paper, please also consider citing it:

  1. @article{tulkens2016automated,
  2. title={The automated detection of racist discourse in dutch social media},
  3. author={Tulkens, St{\'e}phan and Hilte, Lisa and Lodewyckx, Elise and Verhoeven, Ben and Daelemans, Walter},
  4. journal={Computational Linguistics in the Netherlands Journal},
  5. volume={6},
  6. number={1},
  7. pages={3--20},
  8. year={2016}
  9. }

usage

The dictionaries are in .csv format.
The first word of each line is the category name, while the other words are the words in that category.
Included is a python (2.7 & 3.x) script which reads in the dictionaries and outputs relative frequencies.
It can be used for similar dictionaries, such as the LIWC dictionaries.

example

  1. from dictfeaturizer import DictFeaturizer
  2. # Load from csv
  3. d = DictFeaturizer.load("expanded.csv")
  4. text = "this is an example text".split()
  5. score = d.transform(text)
  6. # Direct initialization
  7. direct = {"good": ["good", "splendid"], "bad": ["bad", "useless"]}
  8. d = DictFeaturizer(direct, relative=False)
  9. text = "This stuff is splendid".split()
  10. score_2 = d.transform(text)