项目作者: mondain-dev

项目描述 :
Chinese Historical Phonology
高级语言: Python
项目地址: git://github.com/mondain-dev/chn-hist-phon.git
创建时间: 2018-12-18T05:04:52Z
项目社区:https://github.com/mondain-dev/chn-hist-phon

开源协议:MIT License

下载


ChnHistPhon - Chinese Historical Phonology

Experiments in Chinese Historical Phonology using matrix decomposition and factorization methods.

Prerequisites

We use python for to prepare our data. The following packages are required:

In addition to cjklib, Unihan Database is used. The latest Unihan.zip can be downloaded from https://www.unicode.org/Public/UCD/. Unzip it to /path/to/Unihan.

Running experiments

Prepare data

Once you have cloned this repository to your local /path/to/ChnHistPhon, you can run

  1. python /path/to/ChnHistPhon/ChnHistPhon_1_data_preparation.py

which will create ChnCharData.csv a dataset of Chinese characters we need in /path/to/ChnHistPhon/results.

Perform low-rank SVD

We used softImpute (Mazumder et al., 2010.) to complete the data matrix in ChnCharData.csv, which is followed by dictionary learning and sparse coding in ChnHistPhon_2_run_SoftImpute_DictionaryLearning.py.

Results

The results can be viewed here.