项目作者: Jd8111997

项目描述 :
Speech Enhancing module based on SEGAN
高级语言: Python
项目地址: git://github.com/Jd8111997/Speech-Enhancer.git
创建时间: 2020-08-16T17:50:28Z
项目社区:https://github.com/Jd8111997/Speech-Enhancer

开源协议:MIT License

下载


Speech-Enhancer

Speech Enhancing module based on paper SEGAN

Requirements

  • python v3.5.2 or higher
  • librosa
  • SoX
  • pytorch v0.4.0

Dataset

I have used a toy noisy speech dataset from the university of edinburgh dataset.
Download the train datasets and test datasets, then extract them into your directory and set the path of root_dir accordingly in preprocess_final module.

You can use other dataset, but change the path accordingly in preprocess_final.

Pre-processing

You can change other parameter such as sample_rate, window_size etc in preprocess_final.
Default sample rate is set to 8KHz and window_size is 8192. You can downsample the audio as your needs, by changing sample_rate in downsample_8k.
By default the preprocessed datasets location is set to your current working directory.

Training

  1. python main.py ----batch_size 128 --num_epochs 86
  2. optional arguments:
  3. --batch_size train batch size [default value is 128]
  4. --num_epochs train epochs number [default value is 12]

At every four epoch the test results and model weights will be saved in segan_data_out.
Again adjust the paths and parameters in main.py according to your needs.

Enhancing audio

  1. python Generate_audio.py ----file_name p212_982.wav --time_stamp 20200816_0531 --state state-5.pkl

Give a noisy audio clip as an input to Generate_audio and set the appropriate time_stame and state from segan_data_out that is created from main.py to load saved weights of generator.