项目作者: yohasebe

项目描述 :
WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.
高级语言: Ruby
项目地址: git://github.com/yohasebe/wp2txt.git
创建时间: 2012-06-06T02:56:02Z
项目社区:https://github.com/yohasebe/wp2txt

开源协议:MIT License

下载