项目作者: StarlangSoftware

项目描述 :
Corpus processing library
高级语言: Python
项目地址: git://github.com/StarlangSoftware/Corpus-Py.git
创建时间: 2020-05-28T09:19:11Z
项目社区:https://github.com/StarlangSoftware/Corpus-Py

开源协议:GNU General Public License v3.0

下载


Corpus

Video Lectures

For Developers

You can also see Cython, Java, C, C++, Swift, Js, or C# repository.

Requirements

Python

To check if you have a compatible version of Python installed, use the following command:

  1. python -V

You can find the latest version of Python here.

Git

Install the latest version of Git.

Pip Install

  1. pip3 install NlpToolkit-Corpus

Download Code

In order to work on code, create a fork from GitHub page.
Use Git for cloning the code to your local or below line for Ubuntu:

  1. git clone <your-fork-git-link>

A directory called Corpus will be created. Or you can use below link for exploring the code:

  1. git clone https://github.com/olcaytaner/Corpus-Py.git

Open project with Pycharm IDE

Steps for opening the cloned project:

  • Start IDE
  • Select File | Open from main menu
  • Choose Corpus-Py file
  • Select open as project option
  • Couple of seconds, dependencies will be downloaded.

Detailed Description

Corpus

To store a corpus in memory

  1. a = Corpus("derlem.txt")

If this corpus is split with dots but not in sentences

  1. Corpus(self, fileName=None, splitterOrChecker=None)

The number of sentences in the corpus

  1. sentenceCount(self) -> int

To get ith sentence in the corpus

  1. getSentence(self, index: int) -> Sentence

TurkishSplitter

TurkishSplitter class is used to split the text into sentences in accordance with the . rules of Turkish.

  1. split(self, line: str) -> list