Tex>> tcc>> 返回
项目作者: kurpicz

项目描述 :
Text Corpus Collection
高级语言: C++
项目地址: git://github.com/kurpicz/tcc.git
创建时间: 2017-01-26T12:51:23Z
项目社区:https://github.com/kurpicz/tcc

开源协议:BSD 2-Clause "Simplified" License

下载


Text Corpus Collection (tcc)

This is work in progress!

What is it?

This project provides simple tools to obtain (popular) text corpora that are used for benchmarks and tests.

What it is not?

We do not host any of the corpora. We just provide an easy way to get and/or compute them. Please visit the websites of the corpora for further information.

What is contained?

How to use it?

Use make download to download all files in the download configs, make random to generate random strings as defined in the config and make processing to build all preprocessing tools.