项目作者: b00f

项目描述 :
Persian Spell Checking Dictionary
高级语言: HTML
项目地址: git://github.com/b00f/lilak.git
创建时间: 2015-11-04T15:44:17Z
项目社区:https://github.com/b00f/lilak

开源协议:Other

下载


Lilak, Persian Spell Checking Dictionary

Build Status
Donate

Lilak is an open source project for generating Persian dictionary for hunspell spell checker based on Persian Morphology.

In Persian language affixes can change the meaning of the word. Some suffixes attached to a word as short form of verbs. Part-of-speech plays an important role in Persian language. In some cases the pronunciation of the word can change the suffixes. Check the code for more information.

Lilak has a lexicon of Persian words with part-of-speech tags. Lilak builds a dictionary for hunspell to predict the best form of compound words based on morphological rules.

Content

  1. lilak
  2. |-- build : Build folder. Compiled dictionary goes here.
  3. |
  4. |-- src
  5. | |-- data
  6. | | |-- lexicon : Lexicon of Persian words with part-of-speech tags
  7. | | |-- affixes : Affix (prefix or suffix) rules
  8. | | |-- dic_users : List of words without POS tag.
  9. | | \-- verbs.htm : List of Persian verbs (unstemmed)
  10. | |
  11. | |-- lilak.py : Python script for building lilak dictionary
  12. | \-- test.py : Python script to test lilak accuracy
  13. |
  14. |-- test
  15. | |-- text1 : "Farsi(Persian) is Sugar", A short story by Mohammad-Ali Jamalzadeh
  16. | |-- text2 : "A Hekayat" By Saadi
  17. | |-- text3 : "A Ghazal" By Hafez
  18. | |-- text4 : "Yazdgerd Kingdom" By Ferdowsi
  19. | |-- text5 : "A Ghazal" By Muhammad Husayn Tabataba'i
  20. | |-- text6 : "Have a Safe Trip" A poem by Shafii Kadkani
  21. | |-- text7 : "Se Tar" A short story by Jalal Al-e-Ahmad
  22. | |-- text8 : "End of Shahname" By Mehdi Akhavan-Sales
  23. | |-- text9 : "The Water"s Footsteps" By Sohrab Sepehri
  24. | |-- text10 : "Nei Name" By Rumi
  25. | \-- verbs : Some inflected verbs
  26. |
  27. |-- README.md :
  28. \-- LICENCE : License file

Building Dictionary

Before using lilak please make sure you have install python 3.x.

To build the lilak dictionary, run lilak.py from src folder:

  1. make build
  2. make test

You can find the compiled dictionary at the build folder.

check result.log for test result.

How to contribute

The best way you can contribute on this project is collecting words with correct part-of-speech tags.
Part-of-speech is important to build Lilak. It should classified in main types like: verb, noun, adjective, etc. Also some other tags will be useful. like tense of verb, singular or plural, etc.
Check the src/data/lexicon for more information

Please open an issue if you find any mistakes while using lilak.

Using Lilak

  • You can find compiled dictionaries here.
  • Mozilla Firefox: Install lilak extension from here here.
  • Google Chrome: Go to Settings, find Language and input settings, add Persian language and make sure you have enabled the spell checker option.

Supporting Lilak

If you like this project, please donate or consider becoming a patron:

Become a patron

License

Lilak is published under Apache licence. You may freely use, reproduce, modify or distribute it. If you think lilak is useful please support it.

About the Name

lilac in English came from French lilac “shrub of genus Syringa with mauve flowers”
from Spanish lilac, from Arabic lilak, from Persian lilak, variant of nilak “bluish”

In Memory of Abolhassan Najafi

Abolhassan Najafi was an associate member of Iran’s Academy of Persian Language and Literature. His most famous books is “Ghalat Nanevisim” (Let’s not write incorrect).

Thanks

Special thanks to