TiramisuSELD :cake:

Sound Event Localization and Detection in Tensorflow 2

TiramisuSELD implements some speech event localization and detection architectures.

Requirements

Python 3.6+
Tensorflow 2.2+: pip install tensorflow

Setup Environment and Datasets

Install tensorflow: pip3 install tensorflow or pip3 install tf-nightly (for using tflite)

Install packages: python3 setup.py install

To enable XLA, run TF_XLA_FLAGS=--tf_xla_auto_jit=2 $python_train_script

Clean up: python3 setup.py clean --all (this will remove /build contents)

Training & Testing

Example YAML Config Structure

speech_config: ...
model_config: ...
decoder_config: ...
learning_config:
  augmentations: ...
  dataset_config:
    train_paths: ...
    eval_paths: ...
    test_paths: ...
    tfrecords_dir: ...
  optimizer_config: ...
  running_config:
    batch_size: 8
    num_epochs: 20
    outdir: ...
    log_interval_steps: 500

See examples for some predefined ASR models.

References & Credits

https://github.com/pquochuy/dcase2020-seld