Toolkit that simplifies corpus processing
A part of RuMor project. It contains tools to simplify corpus
processing. Highlights are:
Corpuscula supports Python 3.5 or later. To install it via pip, run:
$ pip install corpuscula
If you currently have a previous version of Corpuscula installed, use:
$ pip install corpuscula -U
Alternatively, you can also install Corpuscula from source of this git
repository:
$ git clone https://github.com/fostroll/corpuscula.git
$ cd corpuscula
$ pip install -e .
This gives you access to examples and data that are not included to the
PyPI package.
After installation, you need to specify a directory where you prefer to store
downloaded corpora:
>>> import corpuscula.corpus_utils as cu
>>> cu.set_root_dir(<path>) # We will keep corpora here
NB: it will create/update config file .rumor
in your home directory.
If you won’t set the root directory, Corpuscula will keep corpora
in the directory where it’s installed.
You can find examples in the directory examples
of our Corpuscula github
repository.
Corpuscula is released under the BSD License. See the
LICENSE file for
more details.