Implementation of "MelNet: A Generative Model for Audio in the Frequency Domain"
Implementation of MelNet: A Generative Model for Audio in the Frequency Domain
pip install -r requirements.txt
config/
. For other datasets, fill out your own YAML file according to the other provided ones.data.extension
within the YAML file.python trainer.py -c [config YAML file path] -n [name of run] -t [tier number] -b [batch size] -s [TTS]
-s
flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when [tier number] != 0
. Warning: this flag is toggled True
no matter what follows the flag. Ignore it if you’re not planning to use it.chkpt/
.inference.yaml
must be provided under config/
.inference.yaml
must specify the number of tiers, the names of the checkpoints, and whether or not it is a conditional generation.python inference.py -c [config YAML file path] -p [inference YAML file path] -t [timestep of generated mel spectrogram] -n [name of sample] -i [input sentence for conditional generation]
[sample rate] : [hop length of FFT]
.-i
flag is optional, only needed for conditional generation. Surround the sentence with ""
and end with .
.MIT License